2nd line drills
There are a number of areas that are important to drill on 2nd line and include some tasks you may not necessarily encounter in your mission team. We want to ensure developers have the opportunity to practise these tasks ahead of the real thing and in preparation of going on call if you are part of the out of hours rota.
Drill publishing emergency banner
Follow the Deploy an emergency banner on Staging.
You’ll need to choose a non-serious and clearly fake news headline. For example:
CAMPAIGN_CLASS
: Death of a notable personHEADING
: Henry Fielding diesSHORT_DESCRIPTION
: English novelist and dramatist known for his earthy humour and satire dies, age 47LINK
: https://en.wikipedia.org/wiki/Henry_FieldingLINK_TEXT
: More information
Run a Terraform plan
Follow the Deploy Terraform instructions, picking a project at random.
You can run this in any environment, as you’re only running plan
- not apply
- so shouldn’t be making any changes.
Use a restored database in an app
On Integration or Staging, follow the Restore an RDS instance via the AWS CLI instructions for an app of your choice.
Force failover to GOV.UK mirror and Emergency publishing using the GOV.UK mirror
- Warn in
#govuk-2ndline-tech
that you’re about to do this, as it will lead to a spike in alerts and will also break continuous deployment for a while (due to Smokey failures). - Follow the Forcing failover to the GOV.UK mirrors instructions on Integration or Staging.
- To verify that it worked, visit a page at random and purge the page from cache. Reload the page, to see the ‘mirrored’ version of the content. NB: you wouldn’t do this in a real incident, as we’d want to serve Fastly’s cached version for as long as possible.
- Undo your changes to have Nginx handling requests again.
Drill logging into accounts
Make sure you can log into the following accounts:
- The AWS Console
- Your individual Fastly account
- Your individual Statuspage account
- Your individual Logit account
- Shared Heroku account
- Shared CKAN/data.gov.uk publisher account
- Shared Rubygems account
- Shared NPM account
Drill scaling up an application
In preparation for a large spike in traffic, you can increase the number of replicas for an app.
Pick an application and try scaling it up in staging. Don’t forget to revert your change afterwards.
Example PR - Increase content store replica count in staging
Drill 2nd line incident processes
Drill an end to end incident
Decide on a hypothetical incident scenario, e.g. “GOV.UK is down”. Walk through the incident management guidance. Use common sense when following the steps (i.e. don’t actually publish an incident to Statuspage or email stakeholders).
Drill how to communicate when Slack is down
Ensure you know how to communicate with your 2nd line colleagues if Slack is unavailable. See Communicate when Slack is unavailable for details.
Drill special deployment conditions
Deploy from AWS CodeCommit when Github is unavailable
Follow the Deploy when GitHub is unavailable instructions.
Drill enabling a code freeze
Choose a continuously-deployed app where you can make a meaningful change to the default branch, such as fixing a typo or merging a Dependabot PR.
Before merging the change, implement a deployment freeze for that app.
View the application’s page in Argo CD in each environment to see whether a deployment happened or not.
Remove the code freeze, then make sure the current version is deployed to all three environments.
Drill making changes to user accounts
Assign a user to their publisher in data.gov.uk
Log into our shared data.gov.uk publisher account. Pick a publisher to do a hypothetical walk though of Assign users to publishers.
Change a user’s permissions in Signon
Carry out a hypothetical walk through of unsuspending a user and resetting a user’s 2FA.
Drill creating and changing redirects
Redirect a route
On Integration or Staging, follow the Removing a route created in the Short URL Manager and Removing a route completely so it can be replaced with another route instructions.
Change a slug and create a redirect
On Integration or Staging, follow the Change a slug and create redirect in Whitehall, picking something at random in Whitehall from one of the group of entities listed (people, role, organisation, etc).
Drill modifying a document’s change note
Modify and remove a document’s change note in Whitehall
On Integration or Staging, follow Modify a change note in Whitehall using this document or one of your choice. Once you have successfully updated the change note you can drill removing a change note in Whitehall.
Modify and remove a document’s change note in Content Publisher
On Integration or Staging, follow Modify a change note in Content Publisher using this document. The 30th November 2021 shows a bespoke change note which you could try changing - click “show all updates” at the bottom of the page.
You can also try deleting the change note. Again, ensure you do this on Staging or Integration.
Modify a document’s change note in Publishing API
On Integration or Staging, follow Modify a change note in Publishing API using this document or one of your choice. Once you have successfully updated the change note you can drill removing a change note in Publishing API.
Drill making changes to the homepage
Drill updating homepage popular links
Change the homepage popular links following Update popular links. Open a draft PR, and deploy your branch to integration. Once deployed, check your change and redeploy the previous branch to integration.
Drill updating homepage promotion slots
Follow the Update homepage promotion slots instructions, using an appropriate image and text. Open a draft PR, and deploy your branch to integration. Once deployed, check your change and redeploy the previous branch to integration.
Drill CDN failover
- Warn in
#govuk-2ndline-tech
that you’re about to do this, because our failover CDN does not have full feature parity with our primary one. - Ensure that you are connected to the VPN before starting.
- Follow the Fall back to AWS CloudFront instructions for staging only.
- Check if GOV.UK Staging still works correctly after performing the failover.
- Revert your changes when finished.