Skip to main content
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert
Last updated: 23 May 2022

High Nginx 5xx rate

You can view the 5xx logs across all machines on this dashboard:

Change the hostname to view different apps.

Spikes

The alert should link to a graphite graph - often certain applications such as Whitehall can have spikes - if you can determine this is a spike it is best to acknowledge the alert and let a team that is working on the app know (or alert Platform Reliability).

Multiple applications reporting errors

If multiple applications are reporting 5xx errors, there is likely to be a common cause. The first thing to check is whether content store is erroring. If content-store is not reporting any errors, but the dependent frontend apps are (see the bottom of the dashboard), it could be that the ARP cache needs to be flushed.

Scaling up

Sometimes a high 5xx rate can be because of a sudden increase in traffic to the site. You can use the Nginx Requests (AWS) dashboard to see if there are an unusually high number of requests to a particular machine class. If there are, you may want to consider scaling up the number of machines available to handle the requests.