Whitehall error ratio too high
You can find the whitehall request rates, error rates, and durations on this dashboard:
Description
This alert will fire when the ratio of requests in an error state are above the threshold of 1 in 10. The configuration of the alert can be found here
Impact
Whitehall is used by civil servants in various departments. A high error rate usually indicates that there’s a problem affecting these users. For example, they may be unable to publish content or create drafts.
Potential resolution steps
You may be able to see error reports in Sentry, but occasionally errors are not captured there. In this case you might find more information in the logs in Kibana.
- If whitehall has been recently deployed and is returning errors it may require a rollback using the deploy github action.
- If whitehall is dealing with an increase in load it could be that you need to scale up.
- If the errors are due to API failures, it may be that one of whitehall’s dependencies (e.g. publishing-api) is the root cause.