Restore Elasticsearch indices from backup
Background
AWS Managed Elasticsearch automatically takes hourly snapshots for backup and
disaster recovery purposes. The snapshot data is stored in an Amazon-owned S3
bucket that is not directly available to us via S3 but is configured as an
Elasticsearch snapshot repository called cs-automated-enc
.
Restores are done via the Elasticsearch API, by making HTTP requests to the
_snapshot
endpoint.
We also have a govuk-production
snapshot repository, which is normally only
used for copying indices from production to the non-production environments.
Restore a specific index from a snapshot
List the available backup snapshots in the
cs-automated-enc
snapshot respository and identify the snapshot that you want to restore from.k exec deploy/search-api -- \ sh -c 'curl "$ELASTICSEARCH_URI/_snapshot/cs-automated-enc/_all?pretty"'
This can take a few seconds.
If an index already exists with the same name as the one you want to restore, delete the existing index.
k exec deploy/search-api -- sh -c 'curl -XDELETE "$ELASTICSEARCH_URI/<index-name>"'
Restore the index from the snapshot. Fill in
<snaphot-id>
and<index-name>
as appropriate.k exec deploy/search-api -- \ sh -c 'curl -XPOST -H 'Content-Type: application/json' "$ELASTICSEARCH_URI/_snapshot/cs-automated-enc/<snapshot-id>/_restore" -d "{\"indices\": \"<index-name>\"}"'
The restore can take a few minutes. The
/_cat/recovery
resource gives an indication of progress.k exec deploy/search-api -- sh -c 'curl "$ELASTICSEARCH_URI/_cat/recovery"'
Once the restore has finished, reprocess any content changes that happened after the backup.
The reprocessing step is necessary in order to bring the restored index up to date, because GOV.UK’s indexing is incremental only. In other words, there is no regular full reindex.
Restore all indices from a snapshot
Restoring all indices is a similar procedure to restoring a specific index.
Identify the snapshot to restore. See step 1 above.
Delete all indices.
k exec deploy/search-api -- sh -c 'curl -XDELETE "$ELASTICSEARCH_URI/_all"'
Restore all indices from the snapshot.
k exec deploy/search-api -- \ sh -c 'curl -XPOST "$ELASTICSEARCH_URI/_snapshot/cs-automated-enc/<snapshot-id>/_restore"'
Once the restore has finished, reprocess recent content changes to bring the indices up to date. See steps 4 and 5 above.
Further reading
See Restoring snapshots in the AWS Managed Elasticsearch/Opensearch documentation.