Onsite backups failed
The backup machine (e.g.
backup-1.management.integration) collects backups from the various data stores at 9am every morning.
The location of the backups is defined in govuk-puppet.
It’s likely that this is failing because one of the backups it’s trying to collect is not ready yet when the process runs. We have an example of running the PostgreSQL backup earlier to deal with this situation.
To rerun an individual backup:
gds govuk connect ssh -e production <backup-hostname-from-alert> (for example, ip-1-2-3-4.eu-west-1.compute.internal) sudo su - govuk-backup cd /etc/backup/ ./001_directory_backup_postgresql_backups_postgresql_primary_1 # Or whichever script is relevant
There will be no output whilst the script is running. You should see a backup being created in the backup location, e.g. if the graphite backup is being re-run, check:
If after running the script you find that you get some “Permission denied” errors on the files that the script is trying to copy, this probably means that the backup on the machine that the script is trying to copy from hasn’t finished yet. Only after that has finished will it change permissions to the