2nd line drills
There are a number of areas that are important to drill on 2nd line. This is to make developers familiar with the process, as well as to validate that the drill steps continue to work.
Drill detaching an instance
Follow the Detaching an instance from an Auto Scaling Group guidance.
Drill publishing emergency banner
Follow the Deploy an emergency banner on Staging.
You’ll need to choose a non-serious and clearly fake news headline. For example:
CAMPAIGN_CLASS: Death of a notable person
HEADING: Henry Fielding dies
SHORT_DESCRIPTION: English novelist and dramatist known for his earthy humour and satire dies, age 47
LINK_TEXT: More information
Drill an end to end incident
Decide on a hypothetical incident scenario, e.g. “GOV.UK is down”. Walk through the incident management guidance. Use common sense when following the steps (i.e. don’t actually publish an incident to Statuspage or email stakeholders).
Deploy from AWS CodeCommit when Github is unavailable
Choose an app and decide on an old release tag or branch to deploy. Follow the Deploying from AWS CodeCommit instructions in the Integration or Staging environment.
Run a Terraform
Follow the Deploy Terraform instructions, picking a project at random.
You can run this in any environment, as you’re only running
plan - not
apply - so shouldn’t be making any changes.
Update homepage promotion slots
Follow the Update homepage promotion slots instructions, using an appropriate image and text. Do this on Integration or Staging.
Use a restored database in an app
Follow the Restore an RDS instance via the AWS CLI instructions for an app of your choice, on Integration or Staging.
Force failover to GOV.UK mirror and Emergency publishing using the GOV.UK mirror
- Warn in
#govuk-2ndline-techthat you’re about to do this, as it will lead to a spike in alerts and will also break continuous deployment for a while (due to Smokey failures).
- Follow the Forcing failover to the GOV.UK mirrors instructions on Integration or Staging.
- To verify that it worked, visit a page at random and purge the page from cache. Reload the page, to see the ‘mirrored’ version of the content. NB: you wouldn’t do this in a real incident, as we’d want to serve Fastly’s cached version for as long as possible.
- Undo your changes to have Nginx handling requests again.
Drill logging into accounts
Make sure you can log into the following accounts:
- Your individual Fastly account
- Your individual Statuspage account
- Your individual Logit account
- Shared Heroku account
- Shared CKAN account
- Shared Rubygems account
- Shared NPM account
Drill how to communicate when Slack is down
Ensure you know how to communicate with your 2nd line colleagues if Slack is unavailable. See “If Slack is unavailable” for details.
Drill scaling up number of workers
In preparation for a spike in traffic, you can increase the number of unicorn workers for an app. See established connections exceeded for details.
Pick an application and drill scaling up the number of workers - see example.
You can create a branch of
govuk-puppet and deploy that branch to Integration to see the unicorn worker change take effect. Delete the branch and re-deploy the latest release of Puppet when you’re done.
Drill enabling a code freeze
Choose a continuously-deployed app where you can make a meaningful change to the default branch, e.g. fixing a typo, or merging a Dependabot PR.
Either before merging the change, or part way through the continuous deployment process, follow the instructions for implementing a deploy freeze for that app.
Follow the deployment pipeline in Jenkins. Confirm that no further environment deployments are triggered. For example, if you implemented the deploy freeze just after the app was deployed to Staging, confirm that the app was then not automatically deployed to Production.
Remove the code freeze, then manually push the changes to all remaining environments so that they’re in sync.