Migration to AWS
To bring the GOV.UK platform in line with the guidance detailed in the Service Manual, it has been migrated to Amazon Web Services.
Most services run on Amazon EC2, but there are some differences in the infrastructure that you should be aware of.
Key Differences
Hostnames and DNS
Traditionally we hardcoded hostnames and IPs on each instance in /etc/hosts
. In AWS, we are making use of Auto Scaling Groups (ASG) and Elastic Load Balancing (ELB) to connect to instances, and internal DNS using Amazon Route 53 for service resolution.
Hostnames
Traditionally you would see hostnames similar to:
backend-1.backend
frontend-1.frontend
puppetmaster-1.management
Hostnames are now automatically generated by DHCP, and refer to the IP address and region that the instance belongs to:
ip-10-1-4-100.eu-west-1.compute.internal
Please see the documentation about accessing the environment.
Service resolution
Traditionally resolving a service name to an IP would be handled by hardcoding names and IPs in /etc/hosts
.
To make use of the dynamic environment in AWS, we are using Amazon Route 53 to resolve service names to their appropriate ELB. Each node group (a set of instances within an autoscaling group) will resolve a main service name, along with any application service names that belong to that group. For example, the calculators-frontend
node group, will resolve calculators-frontend
as the service name:
lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host calculators-frontend
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.27
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.5.238
It will also resolve for an application service name, such as finder-frontend
:
lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host finder-frontend
finder-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.integration.govuk-internal.digital.
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.4.222
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.202
The service name will first resolve the top level environment domain name (integration.govuk-internal.digital
), which will be a CNAME record to a stack specific DNS record.
GOV.UK applications use Plek for service discovery. Plek will return the fully-qualified domain name (FQDN) of the service it is discovering.
irb(main):001:0> Plek.find("publishing-api")
=> "https://publishing-api.integration.govuk-internal.digital"
This will resolve to the associated ELB:
lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host publishing-api.integration.govuk-internal.digital
publishing-api.integration.govuk-internal.digital is an alias for publishing-api.blue.integration.govuk-internal.digital.
publishing-api.blue.integration.govuk-internal.digital has address 10.1.4.215
publishing-api.blue.integration.govuk-internal.digital has address 10.1.5.50
No internal services should be accessed using the external public load balancers from within the internal network.
We are unable to set the internal domain as the default because some applications do self-referred Plek lookups that affect how applications are presented to the user. We have determined it is safer to set specific overrides for services until this behaviour is changed within the applications.
Please see the related ADR for DNS Infrastructure for further detail.
Databases
PostgreSQL and MySQL
We are using Amazon Relational Database Service (RDS) to host PostgreSQL and MySQL databases.
We use a number of application specific db_admin
nodes (for example: whitehall_db_admin
) as a way for Puppet to manage MySQL and PostgreSQL RDS instances. They also run nightly backups to S3 using govuk_env_sync.
DocumentDB
Applications in AWS are gradually being migrated from self-hosted MongoDB to Amazon DocumentDB. Notable differences include:
- DocumentDB implements the MongoDB 3.6 API, whereas our self-hosted MongoDB is version 2.4.
- DocumentDB instances do not support unauthenticated connections; they require a username and password.
- Storage is allocated automatically and scales automatically.
- DocumentDB does not support arbitrary binary data in fields of type
String
because it doesn’t allow strings to contain NUL characters (\\u+0000
).
DocumentDB instances are managed and backed up via the db_admin
bastion hosts, similarly to Postgres and MySQL.
Redis
We are using Amazon Elasticache instead of managing our own Redis instances.
Architecture changes
Removal of load balancer tiers
Due to the use of Elastic Load Balancing we no longer have a need to maintain our own nginx load balancers, and so these have been removed from the stack. See the related ADR for further details.
Merging of MySQL database servers
Traditionally, we had a separate MySQL server for Whitehall. Rather than manage multiple RDS instances, we have merged this into the main MySQL server. See the relevant ADR for details.
Deploying infrastructure
Terraform is used to manage our Infrastructure as Code. This code is stored in govuk-aws.
See the documentation to make and deploy changes to the infrastructure.
Automated application deployments
If an EC2 instance is terminated, it will be automatically rebuilt by the autoscaling group. If an instance runs deployable applications,
it will automatically start a deployment of the applications it runs using the Deploy_App
Jenkins job. It deploys the deployed-to-<environment>
branch,
and runs the deploy:without_migrations
task.
If an instance is having issues, terminating the instance may be the quickest way of ensuring a clean redeploy the applications.
Be aware of instances that run a lot of applications that this may block ongoing deployments due to the time it takes to deploy multiple applications.
For a list of what applications run on which instance types, see the node_class:
entry in the relevant hieradata for the environment: