Backup and restore databases in AWS RDS
Backups of RDS instances are taken
nightly.
They are stored in Amazon S3. SQL dumps are also taken nightly from the various
db_admin
machines via the govuk_env_sync
process.
Restore an RDS instance via the AWS CLI
This documentation will illustrate how to restore a database (DB) instance from a DB Snapshot with AWS CLI.
Before you get started you need to know:
- The environment in which you are restoring the database - replace
throughout the scripts - The name of the database which needs to be restored - if you are restoring multiple databases, you will need to carry out these steps again for it
For more information, read the AWS documentation on Restoring from a DB Snapshot.
1. Retrieve a list of all snapshot ARNs for your application
In this example we are using local-links-manager
:
gds-cli aws govuk-<environment>-admin aws rds describe-db-snapshots --query 'DBSnapshots[].DBSnapshotArn' | grep local-links-manager
Then select the ARN with the latest date.
snapshot_arn=<arn_of_snapshot_database>
For example, snapshot_arn=arn:aws:rds:eu-west-1:210287912431:snapshot:rds:local-links-manager-postgres-2022-07-05-01-09
.
Ensure that the right database ARN has been stored by:
echo ${snapshot_arn}
2. Find which database the snapshot was generated by
You can get this using the DBInstanceIdentifier
, for example:
- db_instance_identifier=local-links-manager-postgres
gds-cli aws govuk-<environment>-admin aws rds describe-db-snapshots --db-snapshot-identifier ${snapshot_arn} --snapshot-type automated --query 'DBSnapshots[].DBInstanceIdentifier'
Store the DBInstanceIdentifier
as a variable:
db_instance_identifier=<DB_Instance_Identifier>
3. Ensure the restored database has the same security groups
The restored database must have the same security groups and be in the same VPC (that’s the “subnet group name” parameter) as the original one, otherwise, apps won’t be able to connect to it. Therefore the database needs to be restored in the same VPC and with the same security groups as the original instance the snapshot came from.
After running the command below, you now have all the parameters you need (snapshot-arn, db-instance-identifier, security-group-id, db-parameter-group-name, and db-subnet-group-name) to restore the database and change the restored database’s security groups to match the original’s.
gds-cli aws govuk-<environment>-admin aws rds describe-db-instances --db-instance-identifier ${db_instance_identifier} --query 'DBInstances[].[VpcSecurityGroups[].VpcSecurityGroupId,DBParameterGroups[].DBParameterGroupName,DBSubnetGroup.DBSubnetGroupName]'
Example of the output:
- vpc-security-group-id = sg-XXXXXXXX
- db-parameter-group-name = local-links-manager-postgres-XXXXXXXXXX
- db-subnet-group-name = blue-govuk-rds-subnet
Store the output as a variable:
vpc_security_group_id=<replace_with_previous_output>
db_parameter_group_name=<replace_with_previous_output>
db_subnet_group_name=<replace_with_previous_output>
4. Restore the database instance from a snapshot
Using the stored variables from the previous steps:
gds-cli aws govuk-<environment>-admin aws rds restore-db-instance-from-db-snapshot --db-subnet-group-name ${db_subnet_group_name} --db-instance-identifier restored-${db_instance_identifier} --db-snapshot-identifier ${snapshot_arn}
To see the newly created database instance, log into AWS Console > RDS > Databases > filter for your database name. You should see the original and newly created one.
5. Test the database has been fully restored
Before moving on to the next step we need to ensure that the database has been fully restored and ready to be used by:
gds-cli aws govuk-<environment>-admin aws rds wait db-instance-available --db-instance-identifier restored-${db_instance_identifier}
This command will wait until the database is ready, and then exit without any output.
6. Modify the restored database instance
If you are doing this as a drill you can update the db_instance_identifier
to something distinguishable so that it is easier to find later in the list of databases in AWS, for example:
- db_instance_identifier =
-
gds-cli aws govuk-<environment>-admin aws rds modify-db-instance --db-instance-identifier restored-${db_instance_identifier} --vpc-security-group-ids ${vpc_security_group_id} --db-parameter-group-name ${db_parameter_group_name}
7. Update the DNS
Once restored, you will need to update the DNS so that the restored database can be accessed on the internal domain.
To get the endpoint of the restored instance:
Example of output:
- Address = restored-local-links-manager-postgres.XXXXXX.eu-west-1.rds.amazon.com
- Port = 54XX
gds-cli aws govuk-<environment>-admin aws rds describe-db-instances --db-instance-identifier restored-${db_instance_identifier} --query 'DBInstances[].Endpoint'
Store the output as a variable:
endpoint_address=<replace_with_address_output_above>
8. Get the zone ID of the GOV.UK internal domain name
(It’ll be in the form “Id”: “/hostedzone/ZXXXXX” - only the Z section is required.)
For example:
- NextHostedZoneId = ZXXXXXXXXXX
gds-cli aws govuk-<environment>-admin aws route53 list-hosted-zones-by-name --dns-name ${endpoint_address} --max-items 1
Store the output as a variable:
next_hosted_zone_id=<replace_with_NextHostedZoneId>
9. Create a local JSON file to update AWS Route53
Amazon Route53 doesn’t have a command line to update just one DNS record. It requires a file for batch changes (even if there’s only one).
For this step you will need to create a file (locally) - e.g. /var/tmp/update_dns.json
, with the following code below.
Please be aware that you can store the file anywhere on your local drive but remember to update the file path in the next step.
For example:
- database-name =
restored-<name_old_db>
- stack-name = blue
- govuk-internal-domain =
<environment>.govuk-internal.digital
- restored-db-endpoint= echo ${endpoint_address} in your terminal then copy and past into file
{
"Comment": "Manual DB restore",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "<database-name>.<stack-name>.<govuk-internal-domain>",
"Type": "CNAME",
"TTL": 300,
"ResourceRecords": [
{
"Value": "<restored-db-endpoint>"
}
]
}
}
]
}
Apply these changes with the following command:
gds-cli aws govuk-<environment>-admin aws route53 change-resource-record-sets --hosted-zone-id ${next_hosted_zone_id} --change-batch file:///var/tmp/update_dns.json
The output should look something like:
{
"ChangeInfo": {
"Id": "/change/C1045684TR3O47QOC1T6",
"Status": "INSYNC",
"SubmittedAt": "2023-08-23T15:16:15.298000+00:00",
"Comment": "Manual DB restore"
}
}
It can take a couple of minutes for the change to be applied and you might have the status PENDING
. You can check the status by running:
gds-cli aws govuk-<environment>-admin aws route53 get-change --id /change/<ChangeInfo_Id>
The restore is now finished!
Please be aware that if you changed the file path in the previous step, remember to change the code below.
10. Check the changes in Route53
To see the changes to AWS Route 53, Log into AWS Console > Route 53 > Hosted Zones > Check the Public and private link to
Point the application to the new RDS instance
Once you have restored a database in AWS RDS, you now need to point the corresponding app towards it.
1. Make a change to the database contents
Through the app’s user interface, or via the app console or database console, make a change that you can use as a sense check to verify that the database switch has been successful.
For example, you might create a draft edition of something, or modify or delete a record. After switching to the restored database, your changes should be undone.
In this example we want to make a change to the database local-links-manager_production
such as delete an old record:
sudo psql -U aws_db_admin -h local-links-manager-postgres -d local-links-manager_production
2. Connect to the restored backup database
This requires updating the CNAME for local-links-manager-postgres
.
- In AWS Route53 navigate to
Route 53 > Hosted zones > integration.govuk-internal.digital
- In the list search for the hostname and select it to edit the record.
- Make a note of the current value if you are planning on reconnecting to the original database afterwards e.g. if you’re carrying this out as a drill
- Replace the value with the new RDS backup and save your changes. This takes about 60 seconds, you can click “view status” for updates. Once updated it will say
INSYNC
. - SSH back into the machine and query for the record you deleted. If the record is back this should verify the app is now using the backup database.
This is only a temporary solution, to be used in an incident. You should continue onto the next section for a permanent solution.
Ensure your setup will continue to work if infrastructure is reprovisioned
If new infrastructure is provisioned, then the “Point the application to the new RDS instance” solution above will break, as Terraform would fall out of sync with the manual changes. We either need to update Terraform with our changes, or manually get our infrastructure back to how it used to be.
For the purposes of drilling, it’s quicker and easier to do the latter. Simply repeat the steps to point the application to the new RDS instance, but this time connect to the original database. If you need to find the endpoint again for the original database, navigate to Amazon RDS, find the database in the list and look for Endpoint & Port
under the Connectivity & Security
tab.
If this is not a drill, then you would not want to connect to the original database again - you’ve created a database from a backup for a reason! The safest approach would be to update Terraform to refer to the new database.
Alternatively, you could:
- Delete the original database
- Create a snapshot from your new database (which we’ll call the “temporary” database)
- Restore an RDS instance (we’ll call this the “new” database) from your temporary database’s snapshot, but using the original hostname this time
- Repeat the steps to point the application to the new RDS instance, this time connecting to the “new” database.
- Delete the temporary database
Delete an obsolete database
PLEASE BE CAREFUL WHEN EXECUTING THIS COMMAND AS IT CANNOT BE UNDONE
For reference, here is the AWS documentation for deleting a database instance.
It is likely that the restored database is missing data since the snapshot was taken and you will want to have a copy of the original database for comparison before deleting it.
The command below will create a DB snapshot before the DB instance is deleted. If you don’t want this, omit the --final-db-snapshot-identifier
parameter.
gds-cli aws govuk-<environment>-admin aws rds delete-db-instance --db-instance-identifier <db_instance_identifier> --final-db-snapshot-identifier <snapshot_name>
You can check the snapshot is available by navigating to RDS > Snapshots in the AWS console. Now that the original RDS instance has been removed this will free up the name for the permanent fix).