GOV.UK CDN static mirrors
The GOV.UK mirror is a static copy of GOV.UK.
When is the GOV.UK mirror used?
When a user requests a GOV.UK page, Fastly retrieves that page from its cache, or fetches the page from GOV.UK Origin if Fastly does not have the page in its cache.
Sometimes, GOV.UK Origin may time out or return a 5xx error response. When that happens, Fastly automatically fetches the page from the GOV.UK mirror instead.
If Fastly goes down, we would manually switch to AWS CloudFront instead of Fastly. Where Fastly makes requests to GOV.UK Origin, AWS CloudFront instead makes all its requests to the GOV.UK mirror.
GOV.UK mirror locations and access
The GOV.UK mirror is hosted in an Amazon Web Services (AWS) S3 bucket. The bucket contains copies of GOV.UK HTML files. The mirror is static, meaning dynamic pages such as search pages will not work. S3 will return 403 Forbidden response for non-existent pages instead of 404, because we don’t allow the ListBucket permission. This also affects requests for features such as Search, which are not covered by the mirrors.
The term “GOV.UK mirror” actually refers to 3 separate mirrors:
- the main
govuk-<environment>-mirrorS3 bucket in one AWS region
govuk-<environment>-mirror-replicaS3 bucket in another region, so if the first AWS region is down, we can fall back to this other region
govuk-<environment>-mirrorbucket in Google Cloud Storage (GCS), so if AWS overall is down, we can fall back to this mirror
Access to the Amazon S3 buckets is restricted. If you have a Fastly, Office or Pingdom IP address, you have read-only access. If you’re an authenticated AWS web console user, you have read-write access.
Access to the GCS bucket is also restricted. You can access the GCS bucket if you can access the secret keys in
govuk-secrets, or if you’re an authenticated Google Cloud Platform (GCP) web console user.
Updates to the GOV.UK mirror
Automatic scripts update the GOV.UK mirror every day.
govuk_crawler_worker on the mirrorer machine:
- consumes the fetched GOV.UK URLs from the RabbitMQ message queue
- retrieves the GOV.UK HTML files returned by these URLs
- saves these HTML files to a
/tmpfolder on the mirrorer machine
- adds any new URLs found on those pages to the back of the RabbitMQ message queue
Every hour, the
govuk_sync_mirror script runs on the mirrorer machine. This script copies the GOV.UK HTML files from the mirrorer machine to the main
govuk-<environment>-mirror AWS S3 bucket. AWS then copies this main bucket to the replica
govuk-<environment>-mirror-replica S3 bucket in another region.
Finally, a job is run in Google Cloud Storage (GCS) at 6:00pm the day after the original
govuk_seed_crawlerscript is run. This job syncs the primary AWS S3 bucket
govuk-<environment>-mirror to the GCS
govuk-<environment>-mirror bucket. For more information on GCS, see the Google Cloud Platform documentation.
govuk_crawler_worker scripts are independent of the mirrors. Stopping these scripts stops the mirror updates, but does not stop the mirrors from working.
Run the following to inspect the contents of the GOV.UK mirrorer machine:
gds govuk connect -e production ssh mirrorer cd /mnt/crawler_worker/www.gov.uk
Mirror GOV.UK content to S3 alert
govuk_sync_mirror cronjob has not succeeded for 24 hours, it triggers the ‘Mirror GOV.UK content to S3’ alert. See the Mirror GOV.UK content to S3 alert documentation for more information.
Forcing failover to the GOV.UK mirrors
If Origin is unavailable, Fastly will automatically retry every request against the mirrors.
To avoid Fastly traffic hitting Origin when Origin is down (potentially making the problem worse), we can fall back to AWS CloudFront, which serves all content using the GOV.UK mirrors.
Alternatively, we can stop Nginx on the cache machines, which will prevent requests hitting GOV.UK applications. Fastly will automatically retry these failed requests against the mirror.
SSH into each cache machine (you can increment box number after the colon to hit each one in turn):
$ gds govuk connect -e production ssh cache:1
Stop Nginx to force use of mirrors:
$ govuk_puppet --test --disable "fail_to_mirror task (by $USER)" $ sudo service nginx stop
When required you can re-enable puppet, which will restart Nginx:
$ govuk_puppet --test --enable
Emergency publishing content using the GOV.UK mirror
The escalation on-call contact will tell you if you need to make changes to GOV.UK while Origin is unavailable. To do this, you must change content on the GOV.UK mirrors. Because the mirror is static HTML, it’s hard to make broad changes to the site, like putting a banner on every page.
If you’re outside of GDS premises, connect to the VPN.
SSH to the mirrorer machine:
gds govuk connect -e production ssh mirrorer
Disable puppet on the machine:
govuk_puppet --disable "stopping crawling to avoid mirror changes"
initctl stop govuk_crawler_worker
Modify the relevant file in the
Upload the file to the AWS S3 bucket using the AWS console.
Upload the file to Google Cloud Storage using the GCP console. Credentials are in the
govuk-secretspassword store, under
If you’re notified that you can revert the change you’ve made, you can do this by following the same emergency publishing process.
Once Origin becomes available again, update Origin to reflect the change you made on the mirror.