2nd line has three main responsibilities:
- Monitoring the state of the GOV.UK infrastructure
- Investigating and responding to technical bug reports
- Providing first line support to queries from data.gov.uk users
- Taking on urgent work or work that doesn’t necessarily belong in any team
If you’re new to Technical 2nd Line, read about our working patterns, ceremonies and policies.
We have a Technical 2nd Line dashboard showing a high level overview of the state of the GOV.UK environments. You can also install our Chrome extension if you want a permanently visible overview. You will need to be on the VPN if accessing from home.
We use Icinga to monitor our platform and alert us when things go wrong. Many alerts have corresponding documentation in these developer docs, detailing how to respond.
You should record critical alerts that aren’t easily solved to the GOV.UK Technical 2nd Line Trello board to help inform the Technical 2nd Line tech lead(s) and the GOV.UK SREs. Technical 2nd Line should investigate these alerts when there is downtime; you do not necessarily have to fix them.
Some alerts are urgent enough to warrant immediate attention, such as parts of the site becoming unavailable or large quantities of error pages being served. We use PagerDuty to notify the primary and secondary engineers on Technical 2nd Line during office hours (9:30am to 5:30pm), and on-call engineers outside of office hours.
If there is a service outage or loss of functionality to a service (whether external or internal), or a security vulnerability is discovered, Technical 2nd Line will declare an incident.
Zendesk is our support ticketing system. When not dealing with incidents and alerts, we should be working through Zendesk tickets.
Read more about processing Zendesk tickets on Technical 2nd Line.
You will likely need to use Grafana to investigate service issues.
Follow these Slack channels while working on Technical 2nd Line:
#govuk-2ndline-tech- the main channel for people on technical 2nd line
#govuk-deploy- every time a Staging/Production deploy is done, this is automatically posted to - people also manually post when putting branches on Integration for testing
#govuk-developers- this is a general channel for developers and can be a good place to ask questions if you are struggling
#govuk-replatforming- this is the channel for the Replatforming team, where the SREs are currently working. However, you should use #govuk-2ndline-tech to contact the RE interruptible person about urgent GOV.UK infrastructure issues.