How to debug underperforming search
Search is one of the more load-sensitive parts of GOV.UK, as it can’t be cached as effectively as more static pages. There are two significant components involved in search: the search-api application, and the AWS-managed Elasticsearch cluster powering it.
Useful metrics to look at are:
Request duration from finder-frontend to search-api and request duration from search-api to Elasticsearch, both on the Search API / Elasticsearch SLIs dashboard.
If the search-api to Elasticsearch duration has increased, then there may be a capacity issue with Elasticsearch. If only the finder-frontend to search-api duration has increased, then there may be a capacity issue with search-api.
The machine dashboard for search.
The AWS dashboard for Elasticsearch in the AWS console.
There are a lot of metrics here. A capacity issue could be suggested by the “Index thread pool” or “Search thread pool” graphs being consistently above the red dashed line, which means that requests are queueing. Talk to RE in that case.