Manually resize ASGs (auto scaling groups)
In AWS, we use auto scaling groups to ensure we have the right number of machines available to handle incoming traffic.
Unfortunately there are a number of limitations with the system currently which means we can’t use it to automatically scale up the number of machines. However, we can manually scale up the number of machines in advance if we anticipate an increase in traffic. This was used effectively after we deployed the “Get ready for Brexit” tool to ensure we would cope with the load.
Manually scaling up/down
Scaling up/down machines in AWS will trigger Icinga alerts so let developers in
#govuk-2ndline-techknow you are about to do this.
Select “Auto Scaling Groups” from the bottom of the menu on the left hand side and find the right machine class in the list (you can filter on the name).
Changing the size of the ASG
Note: If you anticipate this change being permanent, you should make sure to raise a PR against govuk-aws-data once it’s all working to ensure the number doesn’t get put back to the old value if Terraform gets deployed.
In the “Details” tab at the bottom, you will see “Desired Capacity”, “Min” and “Max” which shows the existing configuration. Scroll right and then click on the “Edit” button.
In the box that appears, change the numbers as required. To ensure you get the right number of machines you want, it’s best to change all three numbers to the same value.
If you have:
increased the number of instances (a.k.a scale out):
This should trigger the creation of new machines and automatically run the appropriate
To check the machines are recognised, you can use
govuk_node_list -c <class>on the jumpbox and check the IP addresses printed match those in the EC2 machine listing (you can filter the listing by machine class and sort by the date created).
If any of the machines aren’t recognised by
govuk_node_listyou can destroy the machine and wait for a new one to spawn.
decreased the number of instances (a.k.a scale in):
The number of instances to be terminated will be equally distributed among the 3 availability zones.
Before any instance is terminated, any active connection through the load balancer(s) associated with the instance will be drained so that the instance can be terminated without negative impact on user traffic. For e.g. there should be no HTTP 5xx errors when an instance is terminated via autoscaling scale in.
You may want to speed up the removal of terminated instances in Icinga by following the documentation here