Last updated: 3 May 2023
Router error ratio too high
You can find the router request error rates on this dashboard:
- [5xx Router Requests][router-5x https://grafana.eks.production.govuk.digital/d/router-requests/router-request-rates-errors-durations?orgId=1 x-request-rates-grafana]
You can also view the 500+503 error rates across all applications on this dashboard:
This alert will fire when the ratio of requests in an error state are above the threshold of 1 in 10. The configuration of the alert can be found [here][https://github.com/alphagov/govuk-helm-charts/blob/main/charts/monitoring-config/rules/router.yaml]
Router deals with requests to the rails applications which means that both publishers and/or end users could be seeing errors depending on which application is returning errors. Use this dashboard to check which applications have been erroring recently then look in Kibana for more detailed look into the specific log messages.
potential resolution steps
- If an application has been recently updated and is returning errors it may require a rollback using the app’s deploy github action https://github.com/alphagov/
- If your application is currently dealing with a sudden increase in load it could be that you need to scale your application. You can check the following documentation in order to do this.