search-api: Decision record: perform popularity updating without using an index lock
The current nightly job works by:
- locking the current index
- queuing all records from the index
- starting threaded works to insert the records into the new index with the updated popularity
- waiting for the works to finish
- switch to the new index
- release the index lock
The fact that this process requires the index to be locked while the process runs means that we prefer to run it out of hours so that it doesn't block the insertion of new content.
The index locking is required so that document edits during the reindex process are blocked from writing to the old index, as they would then be discarded. The lock blocks the write until the process has completed and switched to the new index.
This in turn means we need to create a new index to write the updated popularity data to, as the old index is currently locked.
The code is quite complicated as it has to manage a threaded workload, which is difficult to test and reason about.
As part of building the new
govuk index, we want to simplify this process and
ideally avoid the index locking. This means we can use sidekiq's concurrency
implementation instead of having our own, as we no longer need to be able to
unlock the index at the end of the process.
Using external versioning
If we don't use index locking and switching and instead just overwrite the existing document with a copy of itself with the updated popularity figure, the following scenario outlines the different outcome.
Given we have a piece of content called
DocumentAwhich is currently in the index with a version of
5. The popularity update process will create a job to update this content, let's call this
job-Awith the popularity figure for today, let's call this
If we get an update on
DocumentAwhile the popularity update process is running, let's call this
update-Awill be generated with a popularity figure equal to
pop-today, the same as what is in
This can result in one of two things occurring:
job-A- in this case
job-Awill be ignored as it is for an earlier version of
DocumentA, leaving the values from
update-Ain the search index
update-A- in this case both updates will occur, leaving the values from
update-Ain the search index
This process would fail with version_type set to
external but would succeed with
version_type set to
external_gte. The elasticsearch documentation does state that
"If used incorrectly, it can result in loss of data."
External versioning with multiplier
By multiplying the external version fields by a multiplier when it is inserted into
elasticsearch, it is then possible to increment the version field each time you
do a transient data update. This can then be used as normal with the version_type
external, and would skip the transient data update if the content had been
This would mean that with a multiplier of 10,000 we could do a daily update for approx 27 years and with a multiplier of 100,000 we could do a daily update for approx 273 years.
The two options have similar implementation with only minor differences, as a result we
will be going with the easier of the two (using
external_gte), as we will only be
updating a single field (popularity) the risk of data loss is quite low.
An additional advantage of choosing this approach is that we can use it with indices
that aren't currently using
external versioning, as the inplace edit with the current
version - taken at the start of the reindex process - will be ignored if the content
has changed and the version has been automatically incremented.