Skip to main content
Table of contents

Monitoring

How we handle errors

How we serve errors on GOV.UK

When a request to GOV.UK fails, we need to handle the error in some way, so that GOV.UK does not look completely broken. Rather than embedding this into every app, we have multiple layers of error handling.

Note that publishing apps do not have these same layers of error handling; they are not behind a CDN. A publishing app is expected to handle all errors itself, according to the policy in the next section.

* Due to the way errors are handled by the origin servers, it's not currently possible for a frontend app to render a contextual error page for an error response. One option to get around this is to deviate from normal web semantics, and return a 200 response. This should be avoided because such responses are cacheable, indexable (by search engines), and surprising for anyone trying to look for errors in logs.

How we handle errors in our apps

This policy describes what we should do for different types of errors. It applies to all apps, irrespective of other error handling by downstream servers. This policy was first proposed in RFC 87.

High priority errors

  1. When something goes wrong, we should be notified. Applications should report exceptions to Sentry. Applications must not swallow errors.

  2. Notifications should be actionable. Sentry notifications should be something that requires a developer of the app to do something about it. Not just a piece of information.

  3. Applications should not continue to have these errors. The goal of GOV.UK is that applications should not error. When something goes wrong it should be fixed.

Example of high-priority errors:

  • Bugs, where the application crashes unexpectedly

  • Sidekiq non-retryable errors (or retries exhausted)

Low priority errors

  1. When something goes wrong, the error should be recorded in Kibana logs and/or application metrics. A team can then monitor these errors over time, and prioritise ones to fix.

Examples of low-priority errors:

  • Intermittent errors without user impact e.g. user sees a cached version of a page, due to an upstream API timeout, such as a request to the Content Store.

  • Intermittent errors with user impact e.g. an API request timeout occurs when a publisher tries to publish a document in one of our publishing apps.

    • When such errors are expected, we should show the user an error page that gives instructions for how to correct the issue, whether by taking action themselves, or submitting a request for support.
  • User input errors e.g. user submits a form with invalid data.

    • For all 422 "unprocessable entity" errors, we should show the user an error page that gives instructions for how to correct the issue, whether by taking action themselves, or submitting a request for support.

    • For all 404 "not found" errors, we should show the user an error page that gives instructions for how to proceed. Note that 404 responses are returned by default if using Rails' ActiveRecord #find or similar.

  • IP spoof errors (HTTP 400). Rails reports ActionDispatch::RemoteIp::IpSpoofAttackError.

  • Environmental errors e.g. errors due to data sync.

This page was last reviewed on 18 May 2020. It needs to be reviewed again on 18 November 2020 by the page owner #govuk-developers .
This page was set to be reviewed before 18 November 2020 by the page owner #govuk-developers. This might mean the content is out of date.