Uptime Metrics

Uptime metrics are collected for content-store, hmrc-manuals-api, link-checker-api, manuals-publisher, publishing-api, specialist-publisher and travel-advice-publisher, they are available as a Grafana dashboard.

They are available broken down into a day by day view, highlighted in different colours representing the level of uptime. Green means 100%, orange means above 99.31% (equivalent to 10 minutes of downtime) and red for everything else.

Further Reading

The service which collects the uptime data runs on the monitoring machines and is available to see in govuk-puppet. It works by polling /healthcheck every 5 seconds and records an application is up if it receives a 2xx HTTP status code back. It uses statsd to send this data to Graphite under the names stats.guages.uptime.<application> which is the used in the Grafana dashboard.

If you would like to add another app to the uptime collector, you should first make sure there is a /healthcheck endpoint available and then add your application to the end of the line in the service file.

