Skip to main content
Table of contents

Monitoring

Add a Pingdom check

GOV.UK uses Pingdom to provide an external view of the availability of our services, this compares to our internal monitoring which is performed from within the same local network. With internal monitoring (such as Smokey) we can tell that services are serving requests, but this doesn’t necessarily tell us that users can reach them (for example, there could be problems with DNS or a network misconfiguration). Pingdom, therefore, serves this role in providing a different perspective.

Pingdom operates by making pre-defined requests at a regular interval (typically 1 minute) and if it returns a non-success response it will deem the host as down. After a suitable threshold (typically 5 minutes) of downtime it will alert that the host is down.

How to access GOV.UK Pingdom

Credentials for Pingdom are available in govuk-secrets via the 2nd line password store under monitoring/pingdom.

When to add a check

You should add a Pingdom check when we gain extra value from the external perspective and that it can tell us something more than we can get from our own internal monitoring.

For example, we benefit from a single check on assets.publishing.service.gov.uk hostname, this determines an external user can use that hostname. Adding additional checks for assets served on this hostname does not provide any additional information from an external perspective.

For situations where you are considering adding a Pingdom check but this does not provide any additional external insight you should instead consider adding a test to Smokey.

Adding a check

These instructions are based on adding a check for HTTP request, if you are checking something unusual or with specific needs you may need to tweak this for your use case.

  1. In Pingdom, visit the uptime section and click “Add new”.
  2. Name your check based on the service you are monitoring.
  3. Leave the default check interval at 1 minute.
  4. Select the “HTTP(S) check”.
  5. Enter the URL you are monitoring.
  6. Leave the location as the default (North America/Europe).
  7. Leave “Check importance” at the default of “High importance”, we don’t have different configurations for High or Low importance
  8. Select “GOV.UK 2nd line support” in the “Who to alert?” section and uncheck “Platform Team”. This will mean the 2nd line support email is notified when services are down for long enough to alert.
  9. Leave “When down, alert after” at the default value of 5 minutes, this offers a buffer against alerting for a short lived spike.
  10. Update the “Customized message” to capture any links to documentation or specific steps that would help someone if they received this alert.
  11. In the “Webhook” section check the PagerDuty integration, this will mean in and out of hours support team are called up when the service is down and the alert threshold has passed.
This page was last reviewed on 8 June 2020. It needs to be reviewed again on 8 June 2021 by the page owner #govuk-developers .
This page was set to be reviewed before 8 June 2021 by the page owner #govuk-developers. This might mean the content is out of date.