Technical 2nd line
Technical 2nd Line is the user support function of GOV.UK.
Technical 2nd Line’s main responsibilities are:
- Monitoring the state of the GOV.UK infrastructure
- Investigating and responding to technical bug reports
- Taking on urgent work or work that doesn’t necessarily belong in any team
You’ll be set up in PagerDuty so that you can be called if there are any urgent alerts during working hours. Technical 2nd Line shifts are a great opportunity to learn about the GOV.UK stack.
Every Monday, at least 2 people from GOV.UK - a Primary and Secondary, and usually a Shadow - join the team to work on Technical 2nd Line. There are also standbys for the Primary and Secondary.
All of these roles can be fulfilled by developers and Site Reliability Engineers (SREs), whereas only some can be fulfilled by frontend developers, junior technologists and apprentices. See role specific policies for details.
Technical 2nd Line takes priority over the work you do in your usual team.
Shifts start at 9:30 and end at 17:30. You can check the Technical 2nd Line rota to find out when your shift is.
You are required to attend a daily morning standup with your paired 2nd Line partner and the Technical 2nd Line team. There’s a short retrospective at the handover meeting at the end of your shift.
Standby developers are not expected to attend the standups or the incoming handover. If they’re called onto the shift, they should attend the next standup, and should attend the outgoing handover at the end of the shift.
Technical 2nd Line takes priority over the work you do in your usual team, but if there is nothing to action on 2nd line, you are of course free to return to team work. If you have meetings to attend then attend them. Please let the delivery manager and the team know when you’ll be away for long periods, and be respectful of the amount of work your colleagues may have to pick up while you’re away. If there are lots of alerts, or there’s a live incident, or an urgent Zendesk ticket, you’ll need to prioritise Technical 2nd Line above your meetings.
Rules for Primary, Secondary and On Call
Production Admin access is a pre-requisite for joining 2nd line, unless you’re only shadowing.
Folks start out as a Secondary. After two shifts as a Secondary, they’ll start to fill the Primary role, with some exceptions (see role specific policies).
At this point they’ll also start filling the on-call rota, as a Primary. After a couple of Primary on-call shifts, they’ll start taking the on-call Secondary shifts. The thinking behind them starting out as Primary is that engineers who are new to on-call should be paged first and escalate to the Secondary, for the experience.
Role specific policies
Backend developers and SREs are all expected to be on the in-hours and on-call rota, unless their head of community agrees that they have reason to opt out.
Frontend developers are expected to be on the in-hours rota, unless their head of community agrees that they have reason to opt out. They are not expected to be on the on-call rota. Junior technologists and technologist apprentices are also expected to take part in the in-hours rota and not the on-call rota.
A junior technologist can be a shadow or a secondary, but won’t be a primary.
A technologist apprentice will only fulfil the shadow role. However, if they are confident, they can (at their Line Manager’s discretion) go through the process to get Production Admin access, at which point they can request to be added to the secondary role.
Shift swaps, working patterns and sickness
If you need to swap your shift, it’s your responsibility to ensure that adequate cover is in place.
If you need cover for a day or two, arrange a swap for those days with another developer. Please ensure delivery managers are aware of this.
If you need a whole shift swap, arrange this with another developer from your team.
For either of the above, let the Technical 2nd Line delivery manager know, so that they can update the schedule on PagerDuty.
If you cannot make your shift because you’re ill, message the delivery manager and #govuk-2ndline-tech Slack channel. The corresponding standby developer will then take your place.
If your working patterns are not compatible with a 9.30am-5.30pm shift, let the Technical 2nd Line team know so they can find extra support.
If you do not work a 5-day week, please talk to your delivery manager to arrange cover with another developer on your team.
Away days and all-staff events
Before attending an all-staff event, team away-day or any other event that could keep you away from your laptop for long periods at a time: try to swap your shift with someone who is not attending (see above).
Otherwise, attending such events is allowed provided you are able to regularly check Zendesk (e.g. on an hourly basis) and are prepared to drop what you’re doing and work on anything urgent that comes in. You must also be contactable by phone so that you can quickly respond to PagerDuty alerts.
Those in a “standby” role don’t need to actively check Zendesk or Slack, but as above, must be contactable by phone and within easy reach of their laptop.
Monitoring
We have a Technical 2nd Line dashboard showing a high level overview of the state of the GOV.UK environments. You can also install our Chrome extension if you want a permanently visible overview.
Grafana
We use Grafana dashboards to monitor the health of our applications and service across our environments (Integration, Staging, Production). Some useful dashboards include:
- Second line, which includes data from our Origin health and Edge health dashboards
- Sidekiq
- Application deployment dashboards
PagerDuty
Some alerts are urgent enough to warrant immediate attention, such as parts of the site becoming unavailable or large quantities of error pages being served. We use PagerDuty to notify the primary and secondary engineers on Technical 2nd Line during office hours (9:30am to 5:30pm), and on-call engineers outside of office hours. We carry out a Pagerduty drill every Wednesday morning at 10am UTC (11am BST).
Incidents
If there is a service outage or loss of functionality to a service (whether external or internal), or a security vulnerability is discovered, Technical 2nd Line will declare an incident and write up an incident report. We normally review incidents on Mondays at 2-3pm.
Zendesk
Zendesk is our support ticketing system. When not dealing with incidents and alerts, we should be working through Zendesk tickets.
Read more about processing Zendesk tickets on Technical 2nd Line.
2nd Line Trello Board
We use the GOV.UK Technical 2nd Line Trello board to capture pieces of work 2nd Line are required to do, such as:
- Setting up production access
- Recording technical issues
The board is reviewed during the weekly Technical 2nd Line handover meeting, where developers can talk the next team through any new cards and oustanding issues.
It’s your responsibility to help keep this board up to date for the next 2nd Line team.
At the start of your Technical 2nd Line shift you should:
- Read through the cards under Ongoing issues, useful Info & unexplained events so that you’re aware of any ongoing problems that have already been identified. You should try to investigate these issues when there is nothing more urgent happening. At the end of your shift please comment on cards as to whether you saw this issue/alert. This will help the Technical 2nd Line leads review them over a longer period of time and identify any stale cards
When creating a new card please include:
- A summary of the issue
- A screenshot of the alert/issue
- Any additional information that maybe be lost over time e.g. logs
- Links to related Zendesk tickets and suggested reply to users
- Any investigation you have done so far/steps you have taken as a workaround
This will help inform developers, Technical 2nd Line tech lead(s), and the GOV.UK SREs about known issues.
Slack channels
Follow these Slack channels while working on Technical 2nd Line:
- #govuk-2ndline-tech - the main channel for people on Technical 2nd Line
- #govuk-deploy - every time a Staging/Production deploy is done, this is automatically posted to - people also manually post when putting branches on Integration for testing
- #govuk-developers - this is a general channel for developers and can be a good place to ask questions if you are struggling
- #govuk-platform-engineering - Platform Engineering team looks after the GOV.UK Kubernetes clusters and base images