Table of contents

Monitoring

Uptime Metrics

Uptime metrics are collected for content-store, hmrc-manuals-api, link-checker-api, manuals-publisher, publishing-api, specialist-publisher, travel-advice-publisher and whitehall-admin, they are available as a Grafana dashboard.

They are available broken down into a day by day view, highlighted in different colours representing the level of uptime. Green means 100%, orange means above 99.31% (equivalent to 10 minutes of downtime) and red for everything else.

Further Reading

The service which collects the uptime data runs on the monitoring machines and is available to see in govuk-puppet. It works by polling a given endpoint, such as/healthcheck, every 5 seconds and records an application is up if it receives a 2xx HTTP status code back. It uses statsd to send this data to Graphite under the names stats.guages.uptime.<application> which is the used in the Grafana dashboard.

If you would like to add another app to the uptime collector, you should first make sure there is a /healthcheck endpoint available and then add your application to the end of the line in the service file.

Note If there is not an exposed /healthcheck endpoint available, an alternative can be given by using the format service_name:alternative/endpoint.

This page was last reviewed on 16 March 2019. It needs to be reviewed again on 16 March 2020 by the page owner #govuk-2ndline .
This page was set to be reviewed before 16 March 2020 by the page owner #govuk-2ndline. This might mean the content is out of date.