Skip to main content
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 5 Apr 2022

Migration to AWS

To bring the GOV.UK platform in line with the guidance detailed in the Service Manual, it has been migrated to Amazon Web Services.

Most services run on Amazon EC2, but there are some differences in the infrastructure that you should be aware of.

Key Differences

Hostnames and DNS

Traditionally we hardcoded hostnames and IPs on each instance in /etc/hosts. In AWS, we are making use of Auto Scaling Groups (ASG) and Elastic Load Balancing (ELB) to connect to instances, and internal DNS using Amazon Route 53 for service resolution.

Hostnames

Traditionally you would see hostnames similar to:

backend-1.backend
frontend-1.frontend
puppetmaster-1.management

Hostnames are now automatically generated by DHCP, and refer to the IP address and region that the instance belongs to:

ip-10-1-4-100.eu-west-1.compute.internal

Please see the documentation about accessing the environment.

Service resolution

Traditionally resolving a service name to an IP would be handled by hardcoding names and IPs in /etc/hosts.

To make use of the dynamic environment in AWS, we are using Amazon Route 53 to resolve service names to their appropriate ELB. Each node group (a set of instances within an autoscaling group) will resolve a main service name, along with any application service names that belong to that group. For example, the calculators-frontend node group, will resolve calculators-frontend as the service name:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host calculators-frontend
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.27
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.5.238

It will also resolve for an application service name, such as finder-frontend:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host finder-frontend
finder-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.integration.govuk-internal.digital.
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.4.222
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.202

The service name will first resolve the top level environment domain name (integration.govuk-internal.digital), which will be a CNAME record to a stack specific DNS record.

GOV.UK applications use Plek for service discovery. Plek will return the fully-qualified domain name (FQDN) of the service it is discovering.

irb(main):001:0> Plek.find("publishing-api")
=> "https://publishing-api.integration.govuk-internal.digital"

This will resolve to the associated ELB:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host publishing-api.integration.govuk-internal.digital
publishing-api.integration.govuk-internal.digital is an alias for publishing-api.blue.integration.govuk-internal.digital.
publishing-api.blue.integration.govuk-internal.digital has address 10.1.4.215
publishing-api.blue.integration.govuk-internal.digital has address 10.1.5.50

No internal services should be accessed using the external public load balancers from within the internal network.

We are unable to set the internal domain as the default because some applications do self-referred Plek lookups that affect how applications are presented to the user. We have determined it is safer to set specific overrides for services until this behaviour is changed within the applications.

Please see the related ADR for DNS Infrastructure for further detail.

Databases

PostgreSQL and MySQL

We are using Amazon Relational Database Service (RDS) to host PostgreSQL and MySQL databases.

We use a number of application specific db_admin nodes (for example: whitehall_db_admin) as a way for Puppet to manage MySQL and PostgreSQL RDS instances. They also run nightly backups to S3 using govuk_env_sync.

DocumentDB

Applications in AWS are gradually being migrated from self-hosted MongoDB to Amazon DocumentDB. Notable differences include:

  • DocumentDB implements the MongoDB 3.6 API, whereas our self-hosted MongoDB is version 2.4.
  • DocumentDB instances do not support unauthenticated connections; they require a username and password.
  • Storage is allocated automatically and scales automatically.
  • DocumentDB does not support arbitrary binary data in fields of type String because it doesn’t allow strings to contain NUL characters (\\u+0000).

DocumentDB instances are managed and backed up via the db_admin bastion hosts, similarly to Postgres and MySQL.

Redis

We are using Amazon Elasticache instead of managing our own Redis instances.

Architecture changes

Removal of load balancer tiers

Due to the use of Elastic Load Balancing we no longer have a need to maintain our own nginx load balancers, and so these have been removed from the stack. See the related ADR for further details.

Merging of MySQL database servers

Traditionally, we had a separate MySQL server for Whitehall. Rather than manage multiple RDS instances, we have merged this into the main MySQL server. See the relevant ADR for details.

Deploying infrastructure

Terraform is used to manage our Infrastructure as Code. This code is stored in govuk-aws.

See the documentation to make and deploy changes to the infrastructure.

Automated application deployments

If an EC2 instance is terminated, it will be automatically rebuilt by the autoscaling group. If an instance runs deployable applications, it will automatically start a deployment of the applications it runs using the Deploy_App Jenkins job. It deploys the deployed-to-<environment> branch, and runs the deploy:without_migrations task.

If an instance is having issues, terminating the instance may be the quickest way of ensuring a clean redeploy the applications.

Be aware of instances that run a lot of applications that this may block ongoing deployments due to the time it takes to deploy multiple applications.

For a list of what applications run on which instance types, see the node_class: entry in the relevant hieradata for the environment: