Last updated: 21 Nov 2024
Raise issues with Reliability Engineering
You may experience an issue with GOV.UK where you need help from a Site Reliability Engineer (SRE). The SREs generally work on the Platform Engineering team.
If you require assistance
Ask in #govuk-platform-engineering
.
Understanding what SREs can assist with
There is a broad explanation of the different areas of support in GOV.UK in ask for help.
SREs can help with:
- Scalability and resilience
- Designing or improving monitoring, metrics, tracing and observability of system behaviour
- Troubleshooting complex problems
- Designing new systems or backend (APIs, information storage and processing) features
- Designing for graceful degradation under failure conditions (for example the GOV.UK static mirrors)
- Migrating from legacy systems, for example GOV.UK Puppet
- Upgrading software packages that are end-of-life/have security issues/no longer fit for purpose
- Advice on how to structure or maintain Terraform modules for managing cloud resources
- Continuous deployment and continuous delivery systems (CI/CD), build and release automation