Debug underperforming search
Search is one of the more load-sensitive parts of GOV.UK, as it can’t be cached as effectively as more static pages. There are two significant components involved in search: the search-api application, and the AWS-managed Elasticsearch cluster powering it.
Useful metrics to look at are:
Request duration from finder-frontend to search-api, on the finder-frontend app dashboard
If this has increased then there may be a capacity issue with search-api.
Request duration from search-api to Elasticsearch and SageMaker, on the search-api app dashboard
See the “
<thing>req count vs latency” graphs:
- Reranker: if this has increased, queries sorted by relevance (keyword searches) will be slower. This could indicate a performance issue with SageMaker.
- Search: if this has increased, all queries will be slower. This could indicate a performance issue with Elasticsearch.
- Spelling suggestion: if this has increased, finder-frontend pages will be slower. Other search-powered pages, like taxon pages, would not be affected. This could indicate a performance issue with Elasticsearch.
The machine dashboard for search.
The AWS dashboard for Elasticsearch in the AWS console.
There are a lot of metrics here. A capacity issue could be suggested by the “Index thread pool” or “Search thread pool” graphs being consistently above the red dashed line, which means that requests are queueing. Talk to RE in that case.
The AWS dashboard for SageMaker in the AWS console.
A capacity issue could be suggested by the CPU utilisation graph being constantly close to 100%.