Reindex an Elasticsearch index
After updating an Elasticsearch index’s schema by changing the fields or document types, you need to reindex the affected index before the new fields and types can be used.
The reindexing process:
- Locks the Elasticsearch index to prevent writes to the index while data is being copied
- Creates a new index using the schema defined in the deployed version of search-api
- Copies all the data from the old to the new index
- Compares the old and new data to check for inconsistencies
- If everything looks the same, switches the alias to the new index
You don’t need to do this if you have changed the
govuk_document_types gem, instead run the rake task
search:update_supertypes to update documents in-place. This can be
done during working hours.
How to reindex an Elasticsearch index
Do not reindex on production during working hours except in an emergency. Reindexing locks the index for writes, so content is not updated in the search index. See the out-of-date search indices section below if you need to run a reindexing during working hours. Reindexing takes around 2 hours to complete.
To reindex, you can use the following job:
- Run search_api_reindex_with_new_schema on Integration
- Run search_api_reindex_with_new_schema on Staging
- ⚠️ Run search_api_reindex_with_new_schema on Production ⚠️
This task allows either all indices or a single index to be reindexed.
To monitor progress, SSH to a search box and check how many documents have been copied to the new index:
gds govuk connect ssh -e integration search govuk_setenv search-api \ bash -c 'curl "$ELASTICSEARCH_URI/_cat/indices?v"'
Out-of-date search indices
If any content has been published, updated or removed during the indexing job, then the search index will be out-of-date.
See Fix out-of-date search indices for details.
Reindexing does not delete the old index. This lets us switch back to the old index if there is a serious problem with the new one.
Once you’re confident that the reindexing was successful, delete the old
(unaliased) index using the
search:clean rake task:
- Run search:clean SEARCH_INDEX=alias_of_index_to_clean_up on Integration
- Run search:clean SEARCH_INDEX=alias_of_index_to_clean_up on Staging
- ⚠️ Run search:clean SEARCH_INDEX=alias_of_index_to_clean_up on Production ⚠️
Avoid leaving old indices around for more than a few days. If enough old indices hang around, we may hit space limitations and be unable to index new documents.
However, in the case wherein we end up with multiple copies of the same index left behind, we have an automated clean up task that removes any extra indexes over a given age:
rake search:timed_clean MAX_INDEX_AGE=number_of_days SEARCH_INDEX=alias_of_index_to_clean_up
This is running in a Jenkins job that clears any index over 7 days old, and will always leave at least one inactive index available (typically the newest one created) alongside the active index for backup purposes.
Failed to switch to new index
The final part of the reindex is to switch Elasticsearch over to the newly created indexes. We’ve noticed recently that this isn’t always successful. It appears to be that if content is written to the database while the reindex task is running, the task will fail at the end as it detects a difference in the data.
Re-running the reindex task usually fixes this.
To stop the reindexing job
If you need to cancel the reindexing while it’s in progress:
- Stop the reindexing rake task
Unlock the old index by running the
This doesn’t actually stop the reindexing, because reindexing is an internal Elasticsearch progress triggered by the rake task. It will stop the rake task from switching the alias over to the new index once it has copied all the data, which is normally good enough.
If you need to stop the reindexing process itself, for example because Elasticsearch is about to run out of disk space, connect to the search box (see above) then:
Find the ID of the reindexing task:
govuk_setenv search-api \ bash -c 'curl "$ELASTICSEARCH_URI/_tasks?actions=%2Areindex&pretty"'
Stop the task:
govuk_setenv search-api \ bash -c 'curl -XPOST "$ELASTICSEARCH_URI/_tasks/<task_id>/_cancel"'
To switch back to the old index
If you discover a problem after reindexing and need to switch back to the old
index, run the
search:switch_to_named_index rake task:
- Run search:switch_to_named_index[full_index_name] SEARCH_INDEX=index_alias on Integration
- Run search:switch_to_named_index[full_index_name] SEARCH_INDEX=index_alias on Staging
- ⚠️ Run search:switch_to_named_index[full_index_name] SEARCH_INDEX=index_alias on Production ⚠️
full_index_name is the full name of the new index, including the date
and UUID, e.g.
Switching back to an old index means that you’ll lose any content updates that were published while the new index was live. To fix this, replay traffic from both publishing-api and Whitehall.