App healthcheck not ok
Most apps on GOV.UK have a
/healthcheck/ready endpoint that checks if the app is “ready” to respond to requests, including things like connecting to a database (example).
This alert works by making a request to the app’s healthcheck endpoint on a machine where the app runs. For most apps the endpoint is
/healthcheck/ready; for legacy apps it’s just a random page (example).
Most healthcheck endpoints use checks from govuk_app_config; see below for guidance on these. Some apps also implement custom checks and the alert links to custom documentation to explain them:
- Search API app healthcheck not ok
- content-data-api app healthcheck not ok
- datagovuk_publish app healthcheck not ok
Note: most apps also have a separate
/healthcheck/liveendpoint, which often just returns “200 OK” (example). This endpoint is meant to be a lightweight check for use with certain types of infrastructure, such as AWS Elastic Load Balancers (ELBs). Read more about separate healthchecks.
Connection Refused Error
This means the app is not accepting requests for the healthcheck endpoint, and is probably down e.g. if the alert is appearing alongside an
upstart not up alert.
- Check the processes are running.
- Try the healthcheck endpoint manually:
# SSH on to a machine running Content Publisher gds govuk connect ssh -e integration backend # Find the port it's running on ps -ef | grep content-publisher | grep master # Do the Icinga check manually curl localhost:3221/healthcheck/ready
This means the app is accepting requests, but taking too long to respond (over 20 seconds).
- Try the healthcheck endpoint manually, as above, to confirm.
- Check the logs e.g.
tail -100f /var/log/email-alert-api/app.err.log.
- Check for resource issues e.g. on the Machine dashboard.
Active Record Check
This means the app is unable to connect to its database.
- Check for any RDS alerts.
- Check the corresponding AWS RDS dashboard to see if we’re experiencing resourcing issues.
- Check for network connectivity to the DB.
# SSH on to a machine with the problem gds govuk connect ssh -e integration backend # Find the DB connection details govuk_setenv content-publisher env | grep -i postgres # Try to connect psql -h postgresql-primary -U content-publisher -W content_publisher_production # Try a command select * from users;
Sidekiq / Redis Check
This means that the Sidekiq workers can’t connect to Redis.
- Check for any Redis alerts.
- Check for network connectivity to Redis.
# SSH on to a machine with the problem gds govuk connect ssh -e integration backend # Find the redis connection details govuk_setenv content-publisher env | grep -i redis # Try to connect redis-cli -h backend-redis # Try a command keys *