Skip to main content
Last updated: 21 Nov 2024

Environment data sync

Production data is regularly copied to the staging and integration environments - often referred to as the “env sync” or “data sync” process. This allows us to more easily test changes against real data, as well as acting as an automated test of our disaster recovery capabilities.

Staging is overwritten every night, whereas integration is overwritten every Monday (in order to better support content designers, who use the environment for training, so need longer data retention).

Troubleshooting

To check whether the env sync has succeeded for a given app and environment, visit the ‘db-backup’ application in Argo in the relevant environment, and search for the corresponding cronjob (or use the kubectl command line). For example, to check Contacts Admin on Integration, you could visit the db-backup application in Argo Integration and check the logs for the latest db-backup-contacts-admin-mysql job.

How it works

The env sync process is made up of a lot of small Kubernetes cronjobs - one per app and environment - configured in the ‘db-backup’ chart. There is also a search-index-env-sync chart for copying search index data.

The ‘production’ cronjobs back up their respective application’s database data to a production S3 bucket called govuk-production-database-backups (using a ‘backup’ operation - see example). The bucket is configured to automatically replicate to S3 buckets in the other environments. You can read more about how GOV.UK data backups are configured in AWS.

The cronjobs in staging and integration pull from the backup S3 bucket in their environment, and replace the database contents of the given app. These are configured as ‘restore’ and ‘backup’ operations in the chart. Some apps on integration have additional operations to sanitise their data: this isn’t applied to staging as staging can only be accessed by those who have production access (and who therefore have access to the production equivalents already).

History

The Kubernetes cronjobs replace the old “govuk_env_sync” scripts, configured in Puppet and executed on Jenkins. That, in turn, replaced something called “env-sync-and-backup”.