GOV.UK content mirrors
A GOV.UK mirror is a static copy of pages and assets hosted on www.gov.uk or assets.publishing.service.gov.uk (or equivalent domains in integration and staging). A mirror includes:
- HTML pages
- other linked assets (or “attachments”) such as CSVs, PDFs etc
We maintain three mirrors, ranked by priority:
- Primary: AWS S3 bucket named
- Secondary: AWS S3 bucket named
govuk-<environment>-mirror-replicain eu-west-1 (production only)
- Tertiary: Google Cloud Storage (GCS) bucket named
We use multiple mirrors across various AWS regions and GCP to ensure redundancy and increase availability.
When is the GOV.UK mirror used?
If Fastly, our primary CDN, cannot fetch a page from our backend servers (becuase of a timeout or a 5xx error), then Fastly will attempt to serve a page from a mirror in order of priority.
How are the mirrors populated?
Every day the govuk-mirror-sync cronjob crawls the www and assets domains, saves pages and assets to disk and then uploads the files to the primary S3 bucket. The govuk-mirror repository contains the code responsible for crawling and saving pages to disk.
S3 Replication automatically copies any changes from the primary S3 bucket to the secondary S3 bucket. This is configured in govuk-aws.
GCP Storage Transfer Service copies any changes from the primary S3 bucket to the tertiary GCS bucket.
What is not covered by mirrors?
Certain page types aren’t included in the mirrors:
- Smart answer pages (as the govuk-mirror crawler doesn’t support following form links)
- CSV preview pages
Check the logs of the govuk-mirror-sync job in Argo to see there are any errors during crawling, saving pages or uploading to S3.
Check buckets in AWS S3 or GCP to see if they are populated.
You can fetch pages directly from the mirrors by specifying the
Backend-Override header, e.g.
curl -H 'Backend-Override: mirrorS3' https://www.gov.uk. The allowed values are