Skip to main content
Last updated: 27 Jul 2023

Manually resize ASGs (auto scaling groups)

In AWS, we use auto scaling groups to ensure we have the right number of machines available to handle incoming traffic.

Unfortunately there are a number of limitations with the system currently which means we can’t use it to automatically scale up the number of machines. However, we can manually scale up the number of machines in advance if we anticipate an increase in traffic. This was used effectively after we deployed the “Get ready for Brexit” tool to ensure we would cope with the load.

Manually scaling up/down

  1. Scaling up/down machines in AWS will trigger Icinga alerts so let developers in #govuk-2ndline-tech know you are about to do this.

  2. Access the AWS Console and go to the EC2 service.

  3. Select “Auto Scaling Groups” from the bottom of the menu on the left hand side and find the right machine class in the list (you can filter on the name).

Filtering auto-scaling groups

Changing the size of the ASG

Note: If you anticipate this change being permanent, you should make sure to raise a PR against govuk-aws-data once it’s all working to ensure the number doesn’t get put back to the old value if Terraform gets deployed.

  1. In the “Details” tab at the bottom, you will see “Desired Capacity”, “Min” and “Max” which shows the existing configuration. Scroll right and then click on the “Edit” button.

  2. In the box that appears, change the numbers as required. To ensure you get the right number of machines you want, it’s best to change all three numbers to the same value.

Editing auto-scaling groups

  1. Click “Save”.

  2. If you have:

    1. increased the number of instances (a.k.a scale out):

      This should trigger the creation of new machines and automatically run the appropriate Deploy_Node_Apps jobs.

      To check the machines are recognised, you can use govuk_node_list -c <class> on the jumpbox and check the IP addresses printed match those in the EC2 machine listing (you can filter the listing by machine class and sort by the date created).

      If any of the machines aren’t recognised by govuk_node_list you can destroy the machine and wait for a new one to spawn.

    2. decreased the number of instances (a.k.a scale in):

      The number of instances to be terminated will be equally distributed among the 3 availability zones.

      Before any instance is terminated, any active connection through the load balancer(s) associated with the instance will be drained so that the instance can be terminated without negative impact on user traffic. For e.g. there should be no HTTP 5xx errors when an instance is terminated via autoscaling scale in.

      You may want to speed up the removal of terminated instances in Icinga by following the documentation here