Skip to main content
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert
Last updated: 23 May 2022

App healthcheck not ok

Most apps on GOV.UK have a /healthcheck/ready endpoint that checks if the app is “ready” to respond to requests, including things like connecting to a database (example).

This alert works by making a request to the app’s healthcheck endpoint on a machine where the app runs. For most apps the endpoint is /healthcheck/ready; for legacy apps it’s just a random page (example).

Most healthcheck endpoints use checks from govuk_app_config; see below for guidance on these. Some apps also implement custom checks and the alert links to custom documentation to explain them:

Note: most apps also have a separate /healthcheck/live endpoint, which often just returns “200 OK” (example). This endpoint is meant to be a lightweight check for use with certain types of infrastructure, such as AWS Elastic Load Balancers (ELBs). Read more about separate healthchecks.

Connection Refused Error

This means the app is not accepting requests for the healthcheck endpoint, and is probably down e.g. if the alert is appearing alongside an upstart not up alert.

  # SSH on to a machine running Content Publisher
  gds govuk connect ssh -e integration backend

  # Find the port it's running on
  ps -ef | grep content-publisher | grep master

  # Do the Icinga check manually
  curl localhost:3221/healthcheck/ready

Timeout Error

This means the app is accepting requests, but taking too long to respond (over 20 seconds).

  • Try the healthcheck endpoint manually, as above, to confirm.
  • Check the logs e.g. tail -100f /var/log/email-alert-api/app.err.log.
  • Check for resource issues e.g. on the Machine dashboard.

Active Record Check

This means the app is unable to connect to its database.

  • Check for any RDS alerts.
  • Check the corresponding AWS RDS dashboard to see if we’re experiencing resourcing issues.
  • Check for network connectivity to the DB.
# SSH on to a machine with the problem
gds govuk connect ssh -e integration backend

# Find the DB connection details
govuk_setenv content-publisher env | grep -i postgres

# Try to connect
psql -h postgresql-primary -U content-publisher -W content_publisher_production

# Try a command
select * from users;

Sidekiq / Redis Check

This means that the Sidekiq workers can’t connect to Redis.

  • Check for any Redis alerts.
  • Check for network connectivity to Redis.
# SSH on to a machine with the problem
gds govuk connect ssh -e integration backend

# Find the redis connection details
govuk_setenv content-publisher env | grep -i redis

# Try to connect
redis-cli -h backend-redis

# Try a command
keys *