govuk-infrastructure: Upgrading the cluster
This is a generic guide on how to upgrade the cluster to a newer version. This cannot predict future changes, so you should not follow it blindly. You should make changes to it in accordance with the specific instructions that arise for a particular version. Please consult AWS documentation and changelogs before using this procedure.
General outline and things to know
To use this guide you should know 'how to apply Terraform.' We upgrade the cluster in place, starting with integration, followed by staging, and finally production. Integration and staging allow us to make sure that the upgrade goes without problems.
You can only upgrade from one version to the next, 1.17 to 1.18 for example but not 1.17 to 1.21.
Once you have completed a cluster upgrade, you cannot roll it back. You will have to rebuild it from scratch if you need to do so.
Preparing to upgrade
As a pre-requisite for upgrading the cluster, there are two things you should do:
- Check the Elastic Kubernetes Service (EKS) upgrade insights for each cluster
- Read the release notes for the version you're about to upgrade to and note on the story any changes that might affect us. Consider using this template to keep things simple:
# Kubernetes VERSION notable changes
Information pulled from LINK TO RELEASE NOTES
❌ Doesn't affect us
❓ Might affect us
✅ Affects us but doesn't need action
⚠️ Affects us and needs action
## Deprecations and removals
❌ Deprecation/removal that doesn't affect us
⚠️ Deprecation/removal that means we need to do some work before or after the upgrade
❓ Deprecation/removal that we need to look into after the upgrade
## Graduations
❓ Graduation that might be good for us in the future
⚠️ Graduation that enables something for us in the immediate term, or that changes something and requires us to change with it
❌ Graduation that has no bearing on us at all
## Summary
A one sentence summary of the changes and how/if they affect us
If any of these steps point to an issue that would prevent you upgrading, you should stop here.
If any of these steps point to a change that needs making before the upgrade, raise it with the team and decide how to proceed.
Step-by-step procedure
You can upgrade the EKS cluster by changing the version in Terraform and applying the change through Terraform Cloud. Terraform will first upgrade the control plane version, then the node groups, and finally any cluster add-ons. It will pick the most appropriate version for each component.
- Increment
cluster_versionto the version you are upgrading to interraform/tfc-configuration/variables-<ENV>.tf - Raise the change as a PR, and merge it
- Plan and apply a new run in the
tfc-configurationworkspace in Terraform Cloud. This will update the variable sets to the new cluster_version - Plan and apply a new run in the
cluster-infrastructure-<ENV>workspace in Terraform Cloud (You should expect it to take 30-45 minutes to upgrade each cluster). - Lastly, once production is upgraded successfully, update the kubernetes version in
renovate.json. This github action will fail if the version is not correctly updated. This is the line inrenovate.jsonyou need to update eg.
"packageNameTemplate": "{\"kubernetesVersion\": \"$UPDATE_NEW_KUBERNETES_VERSION_HERE\", \"addonName\": \"{{depName}}\", \"region\": \"eu-west-1\"}",
You should expect it to take 30-45 minutes to upgrade each cluster.