Uptime metrics are collected for the
content-store and the
they are available as a Grafana dashboard.
They are available broken down into a day by day view, highlighted in different colours representing the level of uptime. Green means 100%, orange means above 99.31% (equivalent to 10 minutes of downtime) and red for everything else.
The service which collects the uptime data runs on the monitoring machines and
is available to see in govuk-puppet. It works by polling
/healthcheck every 5 seconds and records an application is up if it receives
a 2xx HTTP status code back. It uses statsd to send this data to Graphite under
stats.guages.uptime.<application> which is the used in the Grafana
If you would like to add another app to the uptime collector, you should first
make sure there is a
/healthcheck endpoint available and then add your
application to the end of the line in the service file.