Skip to main content
Last updated: 31 May 2024

Environment data sync

Every night, production data is copied to the staging and integration environments - often referred to as the “env sync” or “data sync” process. This allows us to more easily test changes against real data, as well as acting as an automated test of our disaster recovery capabilities.

Troubleshooting

To check whether the env sync has succeeded for a given app and environment, visit the ‘db-backup’ application in Argo in the relevant environment, and search for the corresponding cronjob (or use the kubectl command line). For example, to check Contacts Admin on Integration, you could visit the db-backup application in Argo Integration and check the logs for the latest db-backup-contacts-admin-mysql job.

How it works

The env sync process is made up of a lot of small Kubernetes cronjobs - one per app and environment - configured in the ‘db-backup’ chart. There is also a search-index-env-sync chart for copying search index data.

The ‘production’ cronjobs back up their respective application’s database data to a production S3 bucket called govuk-production-database-backups (using a ‘backup’ operation - see example). The bucket is configured to automatically replicate to S3 buckets in the other environments. You can read more about how GOV.UK data backups are configured in AWS.

The cronjobs in staging and integration pull from the backup S3 bucket in their environment, and replace the database contents of the given app. These are configured as ‘restore’ and ‘backup’ operations in the chart. Some apps on integration have additional operations to sanitise their data: this isn’t applied to staging as staging can only be accessed by those who have production access (and who therefore have access to the production equivalents already).

History

The Kubernetes cronjobs replace the old “govuk_env_sync” scripts, configured in Puppet and executed on Jenkins. That, in turn, replaced something called “env-sync-and-backup”.