Skip to main content
Last updated: 13 May 2021

Sentry

Sentry is a service that collates errors that happen on GOV.UK in one place, regardless of which application, what environment or what machine the error occurred on. It is far more convenient than logging on to individual machines and querying their logs.

Useful links:

  • Sentry Organization Stats for an overview of how many errors are being sent/rate-limited site-wide.
  • Sentry Projects for a per-project view, enabling you to see which ones are causing the most errors right now.
  • Sentry Grafana dashboards: Production, Staging and Integration (all require VPN), for a per-environment breakdown of Sentry errors.

Nomenclature

  • Project: each GOV.UK application has its own project on Sentry, where errors originating from that application are consolidated. Example: publishing-api
  • Issue: a group of similar errors. For example, multiple instances of NoMethodError from the same app are likely to be logged under the same issue (see example) in the project in Sentry. In practice, there are a variety of factors that determine whether errors are grouped under the same issue.
  • Error/Exception: a single occurrence of an Issue, containing details such as stack trace, arguments, server IP and error message. See example.

How Sentry is integrated on GOV.UK

Projects can be created and edited in the Sentry UI, but this risks creating inconsistencies or missing apps. We therefore configure projects using govuk-saas-config (and its associated rake tasks), which read a list of apps from govuk-developer-docs and make sure that all configuration is set up correctly.

Apps are configured to talk to Sentry using the govuk_app_config gem, which interfaces with Sentry via its GovukError class. Apps call GovukError.configure - see example. This delegates to the sentry-raven gem under the hood, though the gem is now superseded by sentry-ruby, which we have plans to migrate to in the future.

Unhandled exceptions are automatically logged to Sentry, but you can also manually report something to Sentry using GovukError.notify.

Sentry roles

There are different ‘roles’ in Sentry, with different permissions:

  • Most people with access to Sentry have the “Member” role, which lets them view errors, issues and projects.
  • Those with the “Admin” role can configure teams and projects.
  • Those with the “Manager” role can edit organisation settings and set things like global rate limits.
  • Those with the “Owner” role have unrestricted access to the system and can make billing and plan changes.

Some links in this documentation will only be visible to those with the right permissions. If you need a higher permission than you already have, the senior tech team can apply this for you.

Deciding whether to log errors in Sentry

We should only capture actionable errors in Sentry, i.e. errors representing an underlying issue that developers can fix. We should not capture 4XX responses, for example, as these indicate a client issue. In fact, sentry-raven ignores several exception types by default - see below.

Ignoring exception types

govuk_app_config has a list of ignorable exceptions ( excluded_exceptions). If your application raises an exception on this list, it will not be logged in Sentry. Note that sentry-raven also has a default list of exceptions to ignore, but these are overwritten, rather than appended by govuk_app_config, so have no meaning on GOV.UK.

Then there is a list of exceptions to ignore if they occur during the nightly data sync: data_sync_excluded_exceptions. These are exceptions that are commonly raised while the databases and content store are periodically unavailable.

When determining whether or not to ignore an exception, the exception chain is inspected. This means if an exception isn’t on the ignore list, but its underlying cause is an exception on the ignore list, then it will be ignored. This applies to excluded_exceptions and to data_sync_excluded_exceptions.

Ignoring environments

There is a hardcoded list of environments in which Sentry is considered to be active. If the SENTRY_CURRENT_ENV environment variable available to your app is not on the active_sentry_environments list, then none of its errors will be logged to Sentry.

Advanced Sentry customisation

For all of the above, you can easily configure any of these properties by appending to them.

You can also run arbitrary code to decide whether or not an error should be logged in Sentry, using the should_capture lambda. Your custom code will be lazily combined with the default evaluators such that if an exception is on any of the ‘excluded exceptions’ lists, it will be excluded (even if your custom should_capture callback returns true).

When errors are received by Sentry

Sentry first fingerprints the error to decide what Issue to group it under. It then checks how many occurrences of that Issue have happened recently, and rate-limits (i.e. ignores) this occurrence if it is happening too frequently. More details below.

Fingerprinting

Errors are grouped into Issues automatically by Sentry. By default, Sentry groups based on stack trace. If no stack trace is available, it will group by exception type instead. Failing that, it will group by error message.

Fingerprint rules can be applied at the project level by editing the “Issue Grouping” page under a project (see example). For example, you can force all errors of the same exception type to have the same fingerprint. On the same screen, you can set stack trace rules, to remove or rename certain ‘frames’ in the trace. In practice, we don’t currently apply any custom rules, after an in-depth investigation found that Sentry’s default grouping was already very accurate (despite it often looking like the same exception is spread across multiple issues).

Stack trace cleaning

Note that sentry-raven automatically cleans up stack traces for the purposes of fingerprinting, so that stack trace frames like:

app/views/welcome/view_error.html.erb in _app_views_welcome_view_error_html_erb__2807287320172182514_65600 at line 1

…get normalised to:

app/views/welcome/view_error.html.erb at line 1

This means if the same ActionView::Template::Error error happens twice, it is grouped into the same issue instead of erroneously treated as separate issues.

sentry-raven uses a customised version of Rails::BacktraceCleaner to do this. The original BacktraceCleaner would do this, but also remove any “framework trace” from the stacktrace, leaving only the “application trace” behind. The original BacktraceCleaner would actually be beneficial for grouping, as Sentry is very sensitive to stack traces that differ only slightly (which can happen when a dependency is updated or a Ruby version is upgraded). However, removing the framework traces entirely means losing key diagnostic information, so the customised, less aggressive BacktraceCleaner is a better choice.

Rate limiting

GOV.UK is on a legacy billing plan with an account limit of 1000 events per hour. When this limit is reached, subsequent events get discarded, meaning that noisy issues can prevent other, more important issues from being logged.

In practice, it’s more complicated than that. We have a per-project limit of 50%, the lowest that Sentry allows. This means that any one project on Sentry cannot log more than 500 events before it is rate limited. This is a protection mechanism, ensuring that we would need two projects to be recording high-volume errors to risk breaching our account limit. These limits are configured on the Rate Limits page.

In addition, we’ve set up alerting so that any issue which records 100 or more errors in an hour period gets alerted to #govuk-platform-health. These alerts are configured in the Alerts panel.

Sentry issue actions

In the Sentry UI, you can merge related issues together, resolve issues, or ignore issues (permanently or for a set time period).

Merging Sentry issues

Duplicate issues can be merged by going into the project view, checking the boxes associated with each issue, and clicking “Merge”. This should be done very carefully, as issues cannot be unmerged once they are merged, and the issues are often subtly different and warrant being separate issues. For example, they may have the same exception type but occur in different transactions, so require two separate fixes.

To find out why Sentry has treated two issues as separate, visit each issue and scroll to the bottom of the page to see the “Event grouping information”.

Resolving an issue

When you know you’ve fixed the underlying issue, you should comment on the issue explaining what you’ve done, and then click the “Resolve” button. This removes it from the default Sentry UI, making it less noisy, but also has the advantage of marking it as a regression and emailing you if the issue occurs again.

Ignoring an issue

When you’ve identified an issue and written it up as a Trello card, or are actively working on fixing it, it can be unhelpful for the issue to keep accumulating events (and potentially spamming your Slack channel). In these cases, you should comment on the issue with a link to your card or PR, then ignore the issue. You should also set the “Assignee” to yourself.

You can either click “Ignore” to ignore it permanently, or click the arrow next to it to ignore for a set time, e.g. 1 week. You can also un-ignore an issue later.

Commenting on an issue

Click on the issue, then on the “Activity” tab, where you can leave a comment. Comments support markdown.

Deleting and discarding an issue

If you’ve identified an issue that is high-volume, but is unlikely to be fixed any time soon, you can Delete and Discard the issue by clicking the arrow next to the trash can and selecting “Delete and discard future events”.

This should only be used when the issue is likely to have a significant impact on our Sentry quota. It is possible to “undiscard” the issue later, but this will only capture new events. Any events prior to the “undiscard” action are lost.

Special Sentry accounts

There is a 2ndLineBot member on the members list which is set up so that a weekly Sentry report is sent to the 2nd line email address. This bot account should not be deleted.

GDS-wide usage of Sentry

Sentry is used by several programmes in GDS, not just GOV.UK. A report, GDS use of Sentry.io, covers this in more detail, including documenting some of the limitations of the setup.