Skip to main content
Last updated: 29 Jan 2024

Reindex an Elasticsearch index

When to reindex

After updating an Elasticsearch index’s schema by changing the fields or document types, you need to reindex the affected index before the new fields and types can be used.

⚠️ Avoid reindexing in production during working hours, except to fix an outage.

Reindexing locks the index for writes, so content is not updated in the search index. Reindexing takes around 2 hours to complete.

See Fix out-of-date search indices if you need to reindex during working hours.

You don’t need to reindex when making a change to the govuk_document_types gem. Instead, run the Rake task search:update_supertypes to update documents in-place. Updating supertypes can be done during working hours.

What the reindexing process does

  1. Locks the Elasticsearch index to prevent writes to the index while data is being copied
  2. Creates a new index using the schema defined in the deployed version of search-api
  3. Copies all the data from the old to the new index
  4. Compares the old and new data to check for inconsistencies
  5. If everything looks the same, switches the index alias to the new index

Reindex an Elasticsearch index

1. Preflight checks

Make sure that you have sufficient space to complete an index migration.

You can check this in:

If the free storage is below 250GB, check to see if there are any old indices from a previous migration that have not been cleaned up.

Check for old indices

Sign in as a poweruser:

eval $(gds aws govuk-production-poweruser -e --art 8h)

List the search indices (sorting by index name):

k exec deploy/search-api -- sh -c 'curl -s "$ELASTICSEARCH_URI/_cat/indices?v&s=i"'

You should see a table listing the indices. There should be one each of detailed, government, govuk, metasearch and page-traffic (plus others which are not affected by this task).

Example healthy output (columns trimmed for brevity):

health status index
green  open   .kibana_1
green  open   .tasks
green  open   detailed-2023-01-01t21-09-52z-0630e696-9d7f-
green  open   government-2023-01-01t19-46-41z-e44c324c-644
green  open   govuk-2023-01-01t23-09-56z-2a4ec293-2123-40f
green  open   licence-finder
green  open   metasearch-2023-01-01t21-41-47z-4a398b23-8bb
green  open   page-traffic-2023-01-01t04-32-04z-cd2264ce-3

If you see multiple indices which start with the same name, they probably need to be cleaned up. Check the time stamp on the index name. Anything older than a week is likely to have been forgotten by the last person.

If you think you can clean up some indices, please do so:

k exec deploy/search-api -- rake search:clean SEARCH_INDEX=alias_of_index_to_clean_up

The alias_of_index_to_clean_up is the part of the name before the date/time stamp: govuk, government, detailed, metasearch, page-traffic, or alternatively all.

2. Run the reindex

To reindex all indices, run the search:migrate_schema Rake task on search-api:

k exec deploy/search-api -- rake search:migrate_schema

To reindex a specific index, specify the SEARCH_INDEX environment variable:

k exec deploy/search-api -- rake SEARCH_INDEX=government search:migrate_schema

To monitor progress, you can check how many documents have been copied to the new index:

k exec deploy/search-api -- sh -c 'curl "$ELASTICSEARCH_URI/_cat/indices?v"'

2. Clean up

Reindexing does not delete the old index. This lets us switch back to the old index if there is a serious problem with the new one.

Once you’re confident that the reindexing was successful, delete the old (unaliased) index using the search:clean Rake task:

k exec deploy/search-api -- rake search:clean SEARCH_INDEX=alias_of_index_to_clean_up

Avoid leaving old indices around, otherwise we may run into storage limitations and be unable to index new documents.

Troubleshooting

Failed to switch to new index

The final part of the reindex is to switch Elasticsearch over to the newly created indexes. We’ve noticed recently that this isn’t always successful. It appears that if content is written to the database while the reindex task is running, the task will fail at the end as it detects a difference in the data.

Re-running the reindex task usually fixes this.

To stop the reindexing job

If you need to cancel the reindexing while it’s in progress:

  1. Find the ID of the reindexing task:

    k exec deploy/search-api -- sh -c 'curl "$ELASTICSEARCH_URI/_tasks?actions=%2Areindex&pretty"'
    
  2. Stop the task:

    k exec deploy/search-api -- sh -c 'curl -XPOST "$ELASTICSEARCH_URI/_tasks/<task_id>/_cancel"'
    
  3. Unlock the old index by running the search:unlock Rake task:

    k exec deploy/search-api -- rake search:unlock SEARCH_INDEX=alias_of_index_to_unlock
    

To switch back to the old index

If you discover a problem after reindexing and need to switch back to the old index, run the search:switch_to_named_index Rake task:

k exec deploy/search-api -- rake search:switch_to_named_index[full_index_name] SEARCH_INDEX=index_alias

where full_index_name is the full name of the new index, including the date and UUID, e.g. govuk-2018-01-29t17:08:21z-31f39bdb-c62b-4607-8081-19ea87fb1498.

Switching back to an old index means that you’ll lose any content updates that were published while the new index was live. To fix this, re-present the changes from both publishing-api and Whitehall.