Skip to main content
Last updated: 14 Apr 2023

Backup and restore databases in AWS RDS

Backups of RDS instances are taken nightly. They are stored in Amazon S3. SQL dumps are also taken nightly from the various db_admin machines via the govuk_env_sync process.

Restore an RDS instance via the AWS CLI

This documentation will illustrate how to restore a database (DB) instance from a DB Snapshot with AWS CLI.

Before you get started you need to know:

  • The environment in which you are restoring the database - replace throughout the scripts
  • The name of the database which needs to be restored - if you are restoring multiple databases, you will need to carry out these steps again for it

For more information, read the AWS documentation on Restoring from a DB Snapshot.

1. Retrieve a list of all snapshot ARNs for your application

In this example we are using local-links-manager:

gds-cli aws govuk-<environment>-admin aws rds describe-db-snapshots --query 'DBSnapshots[].DBSnapshotArn' | grep local-links-manager

Then select the ARN with the latest date.

snapshot_arn=<arn_of_snapshot_database>

For example, snapshot_arn=arn:aws:rds:eu-west-1:210287912431:snapshot:rds:local-links-manager-postgres-2022-07-05-01-09.

Ensure that the right database ARN has been stored by:

echo ${snapshot_arn}

2. Find which database the snapshot was generated by

You can get this using the DBInstanceIdentifier, for example:

  • db_instance_identifier=local-links-manager-postgres
gds-cli aws govuk-<environment>-admin aws rds describe-db-snapshots --db-snapshot-identifier ${snapshot_arn} --snapshot-type automated --query 'DBSnapshots[].DBInstanceIdentifier'

Store the DBInstanceIdentifier as a variable:

db_instance_identifier=<DB_Instance_Identifier>

3. Ensure the restored database has the same security groups

The restored database must have the same security groups and be in the same VPC (that’s the “subnet group name” parameter) as the original one, otherwise, apps won’t be able to connect to it. Therefore the database needs to be restored in the same VPC and with the same security groups as the original instance the snapshot came from.

After running the command below, you now have all the parameters you need (snapshot-arn, db-instance-identifier, security-group-id, db-parameter-group-name, and db-subnet-group-name) to restore the database and change the restored database’s security groups to match the original’s.

gds-cli aws govuk-<environment>-admin aws rds describe-db-instances --db-instance-identifier ${db_instance_identifier} --query 'DBInstances[].[VpcSecurityGroups[].VpcSecurityGroupId,DBParameterGroups[].DBParameterGroupName,DBSubnetGroup.DBSubnetGroupName]'

Example of the output:

  • vpc-security-group-id = sg-XXXXXXXX
  • db-parameter-group-name = local-links-manager-postgres-XXXXXXXXXX
  • db-subnet-group-name = blue-govuk-rds-subnet

Store the output as a variable:

vpc_security_group_id=<replace_with_previous_output>
db_parameter_group_name=<replace_with_previous_output>
db_subnet_group_name=<replace_with_previous_output>

4. Restore the database instance from a snapshot

Using the stored variables from the previous steps:

gds-cli aws govuk-<environment>-admin aws rds restore-db-instance-from-db-snapshot --db-subnet-group-name ${db_subnet_group_name} --db-instance-identifier restored-${db_instance_identifier} --db-snapshot-identifier ${snapshot_arn}

To see the newly created database instance, log into AWS Console > RDS > Databases > filter for your database name. You should see the original and newly created one.

5. Test the database has been fully restored

Before moving on to the next step we need to ensure that the database has been fully restored and ready to be used by:

gds-cli aws govuk-<environment>-admin aws rds wait db-instance-available --db-instance-identifier restored-${db_instance_identifier}

This command will wait until the database is ready, and then exit without any output.

6. Modify the restored database instance

gds-cli aws govuk-<environment>-admin aws rds modify-db-instance --db-instance-identifier restored-${db_instance_identifier} --vpc-security-group-ids ${vpc_security_group_id} --db-parameter-group-name ${db_parameter_group_name}

7. Update the DNS

Once restored, you will need to update the DNS so that the restored database can be accessed on the internal domain.

To get the endpoint of the restored instance:

Example of output:

  • Address = restored-local-links-manager-postgres.XXXXXX.eu-west-1.rds.amazon.com
  • Port = 54XX
gds-cli aws govuk-<environment>-admin aws rds describe-db-instances --db-instance-identifier restored-${db_instance_identifier} --query 'DBInstances[].Endpoint'

Store the output as a variable:

endpoint_address=<replace_with_address_output_above>

8. Get the zone ID of the GOV.UK internal domain name

(It’ll be in the form “Id”: “/hostedzone/ZXXXXX” - only the Z section is required.)

For example:

  • NextHostedZoneId = ZXXXXXXXXXX
gds-cli aws govuk-<environment>-admin aws route53 list-hosted-zones-by-name --dns-name ${endpoint_address} --max-items 1

Store the output as a variable:

next_hosted_zone_id=<replace_with_NextHostedZoneId>

9. Create a local JSON file to update AWS Route53

Amazon Route53 doesn’t have a command line to update just one DNS record. It requires a file for batch changes (even if there’s only one).

For this step you will need to create a file (locally) - e.g. /var/tmp/update_dns.json, with the following code below.

Please be aware that you can store the file anywhere on your local drive but remember to update the file path in the next step.

For example:

  • database-name = restored-<name_old_db>
  • stack-name = blue
  • govuk-internal-domain = <environment>.govuk-internal.digital
  • restored-db-endpoint= echo ${endpoint_address} in your terminal then copy and past into file
{
    "Comment": "Manual DB restore",
    "Changes": [
        {
            "Action": "UPSERT",
            "ResourceRecordSet": {
                "Name": "<database-name>.<stack-name>.<govuk-internal-domain>",
                "Type": "CNAME",
                "TTL": 300,
                "ResourceRecords": [
                    {
                        "Value": "<restored-db-endpoint>"
                    }
                ]
            }
        }
    ]
}

Apply these changes with the following command:

gds-cli aws govuk-<environment>-admin aws route53 change-resource-record-sets --hosted-zone-id ${next_hosted_zone_id} --change-batch file:///var/tmp/update_dns.json

The restore is now finished!

Please be aware that if you changed the file path in the previous step, remember to change the code below.

10. Check the changes in Route53

To see the changes to AWS Route 53, Log into AWS Console > Route 53 > Hosted Zones > Check the Public and private link to .govuk-internal.digital and search for the restored DB.

Point the application to the new RDS instance

Once you have restored a database in AWS RDS, you now need to point the corresponding app towards it.

1. Make a change to the database contents

Through the app’s user interface, or via the app console or database console, make a change that you can use as a sense check to verify that the database switch has been successful.

For example, you might create a draft edition of something, or modify or delete a record. After switching to the restored database, your changes should be undone.

In this example we want to make a change to the database local-links-manager_production such as delete an old record:

sudo psql -U aws_db_admin -h local-links-manager-postgres -d local-links-manager_production

2. Connect to the restored backup database

This requires updating the CNAME for local-links-manager-postgres.

  1. In AWS Route53 navigate to Route 53 > Hosted zones > integration.govuk-internal.digital
  2. In the list search for the hostname and select it to edit the record.
  3. Make a note of the current value if you are planning on reconnecting to the original database afterwards e.g. if you’re carrying this out as a drill
  4. Replace the value with the new RDS backup and save your changes. This takes about 60 seconds, you can click “view status” for updates. Once updated it will say INSYNC.
  5. SSH back into the machine and query for the record you deleted. If the record is back this should verify the app is now using the backup database.

This is only a temporary solution, to be used in an incident. You should continue onto the next section for a permanent solution.

Ensure your setup will continue to work if infrastructure is reprovisioned

If new infrastructure is provisioned, then the “Point the application to the new RDS instance” solution above will break, as Terraform would fall out of sync with the manual changes. We either need to update Terraform with our changes, or manually get our infrastructure back to how it used to be.

For the purposes of drilling, it’s quicker and easier to do the latter. Simply repeat the steps to point the application to the new RDS instance, but this time connect to the original database. If you need to find the endpoint again for the original database, navigate to Amazon RDS, find the database in the list and look for Endpoint & Port under the Connectivity & Security tab.

If this is not a drill, then you would not want to connect to the original database again - you’ve created a database from a backup for a reason! The safest approach would be to update Terraform to refer to the new database.

Alternatively, you could:

  1. Delete the original database
  2. Create a snapshot from your new database (which we’ll call the “temporary” database)
  3. Restore an RDS instance (we’ll call this the “new” database) from your temporary database’s snapshot, but using the original hostname this time
  4. Repeat the steps to point the application to the new RDS instance, this time connecting to the “new” database.
  5. Delete the temporary database

Delete an obsolete database

PLEASE BE CAREFUL WHEN EXECUTING THIS COMMAND AS IT CANNOT BE UNDONE

For reference, here is the AWS documentation for deleting a database instance.

It is likely that the restored database is missing data since the snapshot was taken and you will want to have a copy of the original database for comparison before deleting it.

The command below will create a DB snapshot before the DB instance is deleted. If you don’t want this, omit the --final-db-snapshot-identifier parameter.

gds-cli aws govuk-<environment>-admin aws rds delete-db-instance --db-instance-identifier <db_instance_identifier> --final-db-snapshot-identifier <snapshot_name>

You can check the snapshot is available by navigating to RDS > Snapshots in the AWS console. Now that the original RDS instance has been removed this will free up the name for the permanent fix).