Skip to main content
Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

App healthcheck not ok

Many apps on GOV.UK have a healthcheck endpoint.

  • This is usually /healthcheck [1].
  • Some apps just pick a random page [1].

The alert works by making a request for the healthcheck endpoint on the machine where the app is running. If you need to test the healthcheck endpoint manually, you can SSH on to the machine and curl it yourself.

# SSH on to a machine running Content Publisher
gds govuk connect ssh -e integration backend

# Find the port it's running on
ps -ef | grep content-publisher | grep master

# Do the Icinga check manually
curl localhost:3221/healthcheck

Apps with a custom healthcheck endpoint often make use of the generic checks in govuk_app_config. Some apps also implement custom checks, and the alert should link to custom documentation to explain these.

Connection Refused Error

This means the app is not accepting requests for the healthcheck endpoint, and is probably down.

Timeout Error

This means the app is accepting requests, but taking too long to respond (over 20 seconds).

  • Try the healthcheck endpoint manually, as above, to confirm.
  • Check the logs e.g. tail -100f /var/log/email-alert-api/app.err.log.
  • Check for resource issues e.g. on the Machine dashboard.

Active Record Check

This means the app is unable to connect to its database.

# SSH on to a machine with the problem
gds govuk connect ssh -e integration backend

# Find the DB connection details
govuk_setenv content-publisher env | grep -i postgres

# Try to connect
psql -h postgresql-primary -U content-publisher -W content_publisher_production

# Try a command
select * from users;

Sidekiq / Redis Check

This means that the Sidekiq workers can't connect to Redis.

  • Check for any Redis alerts.
  • Check for network connectivity to Redis.
# SSH on to a machine with the problem
gds govuk connect ssh -e integration backend

# Find the redis connection details
govuk_setenv content-publisher env | grep -i redis

# Try to connect
redis-cli -h backend-redis

# Try a command
keys *

Sidekiq Retry Size Check

This means that Sidekiq jobs are failing.

  • Check the Sidekiq 'Retry set size' graph to see if we have a high number of failed jobs.
  • Are the workers reporting any problems or any issues being raised in Sentry?
  • Check Kibana for Sidekiq error logs (application: <app> AND @type: sidekiq).

Sidekiq Queue Latency Check

This means the time it takes for a Sidekiq job to be processed is unusually high.

  • Check the Sidekiq 'Queue Length' graph to see if we have a high number of jobs queued up.
  • Check the Machine dashboard or the AWS RDS postgres dashboard to see if we're experiencing resourcing issues.
  • Are the workers reporting any problems or any issues being raised in Sentry?
  • Check Kibana for Sidekiq error logs (application: <app> AND @type: sidekiq).
This page was last reviewed on 11 September 2020. It needs to be reviewed again on 11 March 2021 by the page owner #govuk-2ndline .
This page was set to be reviewed before 11 March 2021 by the page owner #govuk-2ndline. This might mean the content is out of date.