Table of contents

Backup and restore Elasticsearch indices

The Elasticsearch indexes used for search are backed up to disk using es_dump_restore.


This will change to S3 snapshots when production is moved to AWS

Creating a backup

Sometimes you may need to take a backup before a critical operation. To do this, SSH to a rummager-elasticsearch box and run:

$ es_dump http://localhost:9200 /var/es_dump

The first argument is the Elasticsearch instance and the second is the directory in which to store the output. The user running the dump needs permission to write to this directory.

The env-sync-and-backup job creates daily copies of the Elasticsearch snapshots and stores them in S3 for 5 days.

Restoring a backup

Before restoring a backup, make sure you are monitoring the cluster.

To view the current status of the indices from inside the Elasticsearch instance:

curl http://localhost:9200/_cat/indices

Restoring to an index which exists is additive - it doesn’t replace the existing data or delete any documents which don’t exist in the dump. This means that if the import fails (as it sometimes does), you can simply re-run the command.

Given a directory of dumps named after their respective indices, you can restore them using the same steps as the environment data sync script,

alias_name=$(basename $backupfile .zip)
iso_date="$(date --iso-8601=seconds|cut --byte=-19|tr [:upper:] [:lower:])z"
real_name="$alias_name-$(date --iso-8601=seconds|cut --byte=-19|tr [:upper:] [:lower:])z-00000000-0000-0000-0000-000000000000"

es_dump_restore restore_alias "http://localhost:9200/" "$alias_name" "$real_name" "$backupfile" '{}' $BATCH_SIZE

This will restore each backup into a new index, and then move the alias to point to it.

If you need to change the alias back for any reason, you can run the rummager rake task rummager:switch_to_named_index[foo-2017-01-01...], where foo-2017-01-01... is the name of the index you want to point the alias to. The task will automatically determine the correct alias name from the index name.

Once backups have been restored it is necessary to manually delete the old indices as otherwise Elasticsearch will eventually run out of disk space and memory. It’s OK to keep an old index around for a few days in case you need to roll back. You can delete all the unaliased indices by running the rummager task: rummager:clean.

Replaying rummager traffic

By restoring an older backup, you will lose any documents that have been updated since the backup was taken.

After restoring a backup, follow Replaying traffic to correct an out of sync search index to bring the search index back in sync with the publishing apps.

Elasticsearch 5.x support

es_dump_restore does not support Elasticseach 5.x. This applies to both creating a backup and restoring from a backup.

This page was last reviewed on 3 January 2019. It needs to be reviewed again on 3 April 2019 by the page owner #govuk-2ndline .
This page was set to be reviewed before 3 April 2019 by the page owner #govuk-2ndline. This might mean the content is out of date.