Skip to main content
Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Search API app healthcheck not ok

The Search API has a healthcheck endpoint which provides information about the current system status.

When the healthcheck returns a ‘warning’ or ‘critical’ status, find the particular check that caused the alert, then follow the actions below.

Note: The healthcheck endpoint is not publicly available.

Redis connectivity is not OK

The Sidekiq queue (which uses Redis as a data store) contains documents to be indexed. The Search API takes jobs off the queue and adds them to the search indexes.

We use Amazon Elasticache, which provides managed redis instances.

If the Search API cannot connect to Redis, this means that new editions of documents that are added to the queue will not enter the search indexes. While this issue is ongoing new editions won’t appear in search results.

Moreover, if the Search API doesn’t take new jobs off the Sidekiq queue, and jobs continue to be added to the queue (by the publishing-api), this can cause Redis to run out of memory.

How do I investigate this?

You’ll need to find out why the Search API can’t connect to Redis.

General tips: reproduce the connectivity issue, check application logs, and look at the redis cluster (Elasticache) in the AWS console.

More information about Redis Alerts

Sidekiq queue latency is not OK

This alert triggers when there are jobs in the Sidekiq queue that are waiting too long to be processed. This could mean that documents aren’t appearing in search results after they’ve been published.

The thresholds are set in the Search API GitHub repository.

How do I investigate this?

The issue could be caused by a temporary spike in publishing activity, or something being wrong with the Search API.

You can check the Sidekiq Grafana dashboard for the Search API. Take a look at the “Retry set size” - this could mean that jobs are failing. You can then look at Sentry or Sidekiq web to see what’s going on.

Elasticsearch connectivity is not OK

The Search API uses elasticsearch as an underlying data store and search engine.

If the application cannot connect to the elasticsearch cluster, this will prevent end users performing searches.

Note: We use a managed elasticsearch, Amazon Elasticsearch Service, rather than running our own.

To solve this issue, look at the logs of the application to see what is wrong.

How do I investigate this?

Find out why the Search API can’t connect to elasticsearch.

Reranker is not OK

The Search API uses machine learning to rank search results based on analytics data. If this alert fires, something has gone wrong with that process and we’re serving results as they were ordered by elasticsearch.

Unlike the other healthcheck failures, this does not mean that Search API is serving errors. Only that it is serving potentially worse results.

The machine learning model is hosted in Amazon SageMaker.

How do I investigate this?

Find out why the Search API can’t connect to elasticsearch.

  • Look at the error message in the healthcheck response
  • Look at the Search API logs
  • Check the status of the SageMaker endpoint in the AWS console
This page was last reviewed on 4 February 2020. It needs to be reviewed again on 4 August 2020 by the page owner #govuk-2ndline .
This page was set to be reviewed before 4 August 2020 by the page owner #govuk-2ndline. This might mean the content is out of date.