Skip to main content
Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Icinga alerts

Nginx 5xx rate too high for many apps/boxes

You can view the 5xx logs across all machines on this dashboard (change the hostname to view different apps).

Spikes

The alert should link to a graphite graph - often certain applications such as Whitehall can have spikes - if you can determine this is a spike it is best to acknowledge the alert and let a team that is working on the app know (or alert Platform Health).

Scaling up

Sometimes a high 5xx rate can be because of a sudden increase in traffic to the site. You can use the Nginx Requests (AWS)) dashboard to see if there are an unusually high number of requests to a particular machine class. If there are, you may want to consider scaling up the number of machines available to handle the requests. Note: that this is only possible in AWS.

UNKNOWN: INTERNAL ERROR

If the message is “UNKNOWN: INTERNAL ERROR: RuntimeError: no valid datapoints” or “UNKNOWN: INTERNAL ERROR: RuntimeError: no data returned for target”, it probably means that statsd or collectd stopped submitting data for a period. Statsd metrics (those that begin with stats.) don’t get created until the first event of a given type. For infrequently-used apps which rarely have errors, the http_5xx may never get created. You can force creation by creating a zero-value http_500 counter:

fab $environment -H frontend-1.frontend statsd.create_counter:frontend-1_frontend.nginx_logs.static_publishing_service_gov_uk.http_500

Note that the http_5xx counters are created by carbon-aggregator, so they will automatically be created when a corresponding http_500 counter gets created. You should not create a statsd counter for http_5xx as this will confuse carbon-aggregator.

For collectd metrics (those without a leading stats. prefix), you probably just need to wait for the metric to get created.

This page was last reviewed on 10 September 2019. It needs to be reviewed again on 10 March 2020 by the page owner #govuk-2ndline .
This page was set to be reviewed before 10 March 2020 by the page owner #govuk-2ndline. This might mean the content is out of date.