Skip to main content
Last updated: 29 Jul 2021

Use GOV.UK Mirror

The GOV.UK mirror is a static version of the entire GOV.UK website. A crawler generates the mirror every hour by navigating through most of the site and saving each HTML page it visits. See the Developer docs on updating the GOV.UK mirror for more information.

The GOV.UK mirror provides a static backup of GOV.UK as a fallback, and is not designed for analytics. As such, the GOV.UK mirror:

The following example pages show what page types are cached by the crawler.

Smart answer start pages and their first pages, such as the Child Benefit Tax calculator, are included in the mirror. However, all other pages are missing as they require user input.

Internal search result pages requiring user input and pre-populated internal search pages, such as the “Services” internal search page, are not stored.

Local transaction result pages, such as the “Request a repair to a council property” page, are not cached as they require user input.

For special checkers like the Brexit checker, only the start, first question and results pages are stored as the intermediate question pages are query strings. The results page has no useful content, as there is no user input.

Get access to the GOV.UK mirror

Before you start, you must have:

  • access to AWS
  • installed the GDS command line tools

See the Get started on GOV.UK developer documentation for more information on how to do this.

Copy GOV.UK mirror from AWS S3 bucket

The GOV.UK mirror is stored in the AWS govuk-production-mirror-replica S3 bucket.

To work with the GOV.UK mirror remotely, you should copy the GOV.UK mirror from the govuk-production-mirror-replica bucket to another bucket.

The following content assumes that you want to copy the GOV.UK mirror to the govuk-data-infrastructure-integration S3 bucket.

  1. Sign into AWS.
  2. Select your name in the top right of the screen and select Switch roles.
  3. Under Account, you can select select govuk-infrastructure-integration or 210287912431.
  4. Under Role, select govuk-datascienceusers.
  5. You can enter any text into Display name or leave this field empty.
  6. You can select any colour in Colour. Best practice is to select green for integration, amber for staging and red for production.
  7. Select Switch Role.
  8. Run the following in your command line:

    gds aws govuk-integration-datascience --assume-role-ttl 480m aws s3 sync s3://govuk-production-mirror-replica/www.gov.uk s3://govuk-data-infrastructure-integration/{YYYYMMDD}-govuk-production-mirror-replica
    

    where {YYYYMMDD} is today’s date in year-month-day format.

    The --assume-role-ttl 480m allows 8 hours (480 minutes) to transfer the data between the two S3 buckets. Using aws s3 sync also allows you to restart the transfer from where you left off if there are any errors.

Download the GOV.UK mirror to your local machine

You can download the GOV.UK mirror to your local machine.

You must first copy the GOV.UK mirror from the govuk-production-mirror-replica AWS S3 bucket to another bucket, and then download from that second bucket to your local machine.

The following content assumes that you want to download the GOV.UK mirror from the govuk-data-infrastructure-integration S3 bucket.

  1. Sign into AWS.
  2. Select the govuk-datascienceusers role.
  3. Run the following code in your terminal to download the mirror locally:

    gds aws govuk-integration-datascience --assume-role-ttl 480m aws s3 sync s3://govuk-data-infrastructure-integration/{YYYYMMDD}-govuk-production-mirror-replica YOUR_LOCAL_FOLDER
    

    where {YYYYMMDD} is today’s date in year-month-day format, and YOUR_LOCAL_FOLDER is a folder on your machine.

Downloading the mirror locally is time and resource-intensive. To save time and resources, you can instead run your code on AWS Sagemaker and stream the data from S3.