Skip to main content
Last updated: 17 Nov 2025

GOV.UK content mirrors

A GOV.UK mirror is a static copy of pages and assets hosted on www.gov.uk or assets.publishing.service.gov.uk (or equivalent domains in integration and staging). A mirror includes:

  • HTML pages
  • related assets for those pages (e.g. JavaScript, CSS, images, fonts)
  • other linked assets (or “attachments”) such as CSVs, PDFs etc

Architecture

How content flows through the mirror system

The GOV.UK mirror system operates in two main flows:

1. Content Population Flow (Nightly)

The govuk-mirror-sync cronjob crawls content, saves it locally, and distributes it across multiple storage locations for redundancy.

flowchart TB
    A[govuk-mirror-sync<br/>cronjob in Argo] --> B[Crawl www &<br/>assets domains]
    B --> C[Save pages<br/>to local disk]
    C --> D[Upload files to Primary S3 Bucket]
    D --> E[Primary S3<br/>govuk-env-mirror<br/>eu-west-2]
    E --> |"Production Only"| F[S3 Replication]
    E --> |"Staging & Production"|H[GCS Transfer<br/>Service]
    F --> G[Secondary S3<br/>govuk-env-mirror-replica<br/>eu-west-1<br/>Production only]
    H --> I[Tertiary GCS<br/>govuk-env-mirror<br/>Staging & Production]

2. Content Serving Flow (On Origin Failure)

When Fastly cannot reach our origin servers, it attempts to serve content from mirrors in priority order.

flowchart TB
    I[User Request] --> J[Fastly CDN]
    J --> K{Fetch<br/>from Origin}
    K -->|"200 OK"| L[Serve from<br/>Origin]
    K -->|"5xx/timeout"| T{Stale Cache<br/>Available?<br/>up to 24h}
    T -->|Yes| U[Serve Stale<br/>from Cache]
    T -->|No| M[Fall back to<br/>Primary S3 Mirror]
    M --> N{Success?}
    N -->|Yes| O[Serve from<br/>Primary]
    N -->|No| P[Try Secondary<br/>S3 Mirror]
    P --> Q{In Production<br/>and Successful?}
    Q -->|Yes| R[Serve from<br/>Secondary]
    Q -->|No| S[Try Tertiary<br/>GCS Mirror]

Available mirrors

We maintain three mirrors, ranked by priority:

  1. Primary: AWS S3 bucket named govuk-<environment>-mirror in eu-west-2
  2. Secondary: AWS S3 bucket named govuk-<environment>-mirror-replica in eu-west-1 (production only)
  3. Tertiary: Google Cloud Storage (GCS) bucket named govuk-<environment>-mirror (staging and production only)

We use multiple mirrors across various AWS regions and GCP to ensure redundancy and increase availability.

Having our primary mirror in eu-west-2 is a conscious decision to improve redundancy in the instance that a problem with eu-west-1 is the cause of an outage that has impacted our compute capacity.

GCP (GCS) Mirror

Accessing the GCP Project

Each environment’s Google Cloud resources live inside the relevant Project for the environment, e.g. “GOVUK Integration”, “GOVUK Staging”, etc.

If you want to access the GCP resources for the Integration environment, first make sure you have access to GCP and then you can access these through the Google Cloud Console by navigating to:

Open Project Picker (⌘+O) -> GOVUK Integration

Then you can access the Bucket and Storage Transfer Jobs.

GCS Bucket

For example, in Integration, you can access these through the Google Cloud Console by navigating to:

Cloud Storage -> Buckets -> govuk-integration-mirror

Once there you should find the relevant directory structure for assets and www.gov.uk.

Storage Transfer Job

To find the job that manages the transfer from the AWS S3 Bucket to the GCS Bucket, navigate to:

Storage transfer -> Transfer jobs

From there, you should see all the configured transfer jobs (currently just one at the time of writing) and can view the configuration and history of them, such as:

  • Bytes copied
  • Bandwidth
  • Errors
  • Number of objects
  • Daily run histoy

You may find the Terraform that configures the Storage Transfer here.

When is the GOV.UK mirror used?

If Fastly, our primary CDN, cannot fetch a page from our backend servers (because of a timeout or a 5xx error), then Fastly will attempt to serve a page from a mirror in order of priority.

Fastly failover behavior

When the origin is unavailable:

  1. Fastly first attempts to serve the page from the primary S3 mirror (eu-west-2)
  2. If that fails, it tries the secondary S3 replica (eu-west-1, production only)
  3. As a final fallback, it attempts the tertiary GCS mirror

This layered approach ensures maximum availability even if individual mirror buckets have issues.

How are the mirrors populated?

Every night the govuk-mirror-sync cronjob crawls the www and assets domains, saves pages and assets to disk and then uploads the files to the primary S3 bucket. The govuk-mirror repository contains the code responsible for crawling and saving pages to disk.

S3 Replication automatically copies any changes from the primary S3 bucket to the secondary S3 bucket. This is configured in govuk-aws.

GCP Storage Transfer Service copies any changes from the primary S3 bucket to the tertiary GCS bucket.

What is not covered by mirrors?

Certain page types aren’t included in the mirrors:

  • Smart answer pages (as the govuk-mirror crawler doesn’t support following form links)
  • CSV preview pages

Monitoring & Health Checks

Success signals

A healthy mirror sync can be verified through:

Job completion time:

Content freshness:

  • Pages published on a given day should appear in the mirror within 24 hours (by the next nightly sync)
  • Spot-check recently published content by fetching directly from mirrors using the Backend-Override header

Storage verification:

  • Check that all three mirror buckets (primary S3, secondary S3, tertiary GCS) contain recently updated objects
  • Verify object counts and total storage sizes are roughly consistent across mirrors

Content-level checks

To verify mirror content is up to date:

  1. Identify a page published in the last 24 hours from the publishing pipeline
  2. Fetch it from the primary mirror: curl -H 'Backend-Override: mirrorS3' https://www.gov.uk/path/to/page
  3. Confirm the page content matches what’s expected

Failure Indicators

Signs that mirrors may need investigation

Job-level failures:

  • govuk-mirror-sync job fails to complete in Argo
  • Job runs significantly longer or slower than usual (more than 9 hours)
  • Job finishes much quicker than expected (e.g. 30 minutes)
  • Error messages in job logs indicating crawl failures or upload issues

Storage-level issues:

  • One or more mirror buckets are empty or contain significantly fewer objects than expected
  • Last modified timestamps on bucket objects are more than 24-48 hours old
  • S3 replication or GCS transfer metrics show failures

Serving-level problems:

  • Users report stale content when origin is supposedly healthy
  • Direct mirror fetches return 404s for content that should exist

Investigation steps

If you suspect mirror issues:

  1. Check the mirror sync job: Review LogIt for errors during crawl, save, or upload phases - use field selector: kubernetes.labels.app_kubernetes_io/name = mirror
  2. Verify bucket contents: Check AWS S3 Console and GCP Console to confirm buckets are populated and recently updated
  3. Test direct access: Use curl with Backend-Override header to fetch from each mirror and verify responses
  4. Review Fastly logs: Check if Fastly is falling back to mirrors unexpectedly
  5. Check replication status: Verify S3 replication metrics and GCS transfer job status

Troubleshooting

How to poke it

You can test mirrors directly during incidents or investigation:

Fetch a page from each mirror:

# Primary S3 mirror (eu-west-2)
curl -H 'Backend-Override: mirrorS3' https://www.gov.uk/

# Secondary S3 replica (eu-west-1, production only)
curl -H 'Backend-Override: mirrorS3Replica' https://www.gov.uk/

# Tertiary GCS mirror
curl -H 'Backend-Override: mirrorGCS' https://www.gov.uk/

Check for specific content:

# Fetch a recently published page
curl -H 'Backend-Override: mirrorS3' https://www.gov.uk/government/publications/example

# Fetch an asset
curl -H 'Backend-Override: mirrorS3' https://assets.publishing.service.gov.uk/path/to/asset.pdf

The allowed values for Backend-Override are mirrorS3, mirrorS3Replica and mirrorGCS.

Common checks

Review job logs:

Check the logs in LogIt for any errors during crawling, saving pages or uploading to S3.

You can filter the correct logs by using the following field selector: kubernetes.labels.app_kubernetes_io/name = mirror.

Verify bucket contents:

  1. AWS S3: Check the AWS Console for buckets govuk-<environment>-mirror and govuk-<environment>-mirror-replica
  2. GCP: Check the GCP Console for bucket govuk-<environment>-mirror
  3. Look for recent object modification times and verify object counts

Check replication:

  • S3 replication metrics in AWS Console
  • GCS Storage Transfer Service job status in GCP Console

Known Gaps & Potential Future Work

Areas where the mirror and related components could be improved:

Monitoring gaps:

  • We don’t currently have automated alerts for mirror sync failures or unusual runtimes
  • Content freshness is checked manually rather than automatically
  • No automated verification that all three mirrors contain consistent content

Operational improvements:

  • Better visibility into which mirror Fastly is actually serving from during failover scenarios
  • Clearer runbook for what to do if all mirrors are stale or unavailable
  • Differential or incremental changes to the sync script to reduce the time and costs required