Environment data sync

Last updated: 23 Jul 2025

Production data is regularly copied to the staging and integration environments - often referred to as the “env sync” or “data sync” process. This allows us to more easily test changes against real data, as well as acting as an automated test of our disaster recovery capabilities.

Staging is overwritten every night, whereas integration is overwritten every Monday (in order to better support content designers, who use the environment for training, so need longer data retention).

Troubleshooting

To check whether the env sync has succeeded for a given app and environment, visit the ‘db-backup’ application in Argo in the relevant environment, and search for the corresponding cronjob (or use the kubectl command line). For example, to check Link Checker API on Integration, you could visit the db-backup application in Argo Integration and check the logs for the latest db-backup-link-checker-api-postgres job. Here’s how you might do that:

# Find the relevant db-backup jobs
$ kubectl get jobs -n apps | grep db-backup-link-checker-api-postgres

db-backup-link-checker-api-postgres-29219143                  Complete   1/1           62s        33h
db-backup-link-checker-api-postgres-29220583                  Complete   1/1           58s        9h

# Get the status of the job
$ kubectl describe job db-backup-link-checker-api-postgres-29220583 | grep Status
Pods Statuses:            0 Active (0 Ready) / 1 Succeeded / 0 Failed

# Assuming the job has failed, get the pod name for the job...
$ kubectl get pods -n apps -l job-name=db-backup-link-checker-api-postgres-29220583
db-backup-link-checker-api-postgres-29220583-abcde

# ...then get the logs for the container (assuming the container inside the pod is named 0-restore)
$ kubectl logs -n apps db-backup-link-checker-api-postgres-29220583-abcde -c 0-restore

How it works

The env sync process is made up of a lot of small Kubernetes cronjobs - one per app and environment - configured in the ‘db-backup’ chart. There is also a search-index-env-sync chart for copying search index data.

The ‘production’ cronjobs back up their respective application’s database data to a production S3 bucket called govuk-production-database-backups (using a ‘backup’ operation - see example). The bucket is configured to automatically replicate to S3 buckets in the other environments. You can read more about how GOV.UK data backups are configured in AWS.

The cronjobs in staging and integration pull from the backup S3 bucket in their environment, and replace the database contents of the given app. These are configured as ‘restore’ and ‘backup’ operations in the chart. Some apps on integration have additional operations to sanitise their data: this isn’t applied to staging as staging can only be accessed by those who have production access (and who therefore have access to the production equivalents already).

History

The Kubernetes cronjobs replace the old “govuk_env_sync” scripts, configured in Puppet and executed on Jenkins. That, in turn, replaced something called “env-sync-and-backup”.

Environment data sync

Troubleshooting

How it works

History

More in the Backups section

Learn

How to...