Table of contents
This page was set to be reviewed before 2018-04-22 by the page owner: #govuk-2ndline. This might mean the content is out of date. Read how to review a page.

GOV.UK in AWS

To bring the GOV.UK platform in line with the guidance detailed in the Service Manual, it is being migrated to Amazon Web Services.

This is a transitional step before GOV.UK is migrated to the Government PaaS.

Most services run on Amazon EC2, but there are some differences in the infrastructure that you should be aware of.

Key Differences

Hostnames and DNS

Traditionally we hardcoded hostnames and IPs on each instance in /etc/hosts. In AWS, we are making use of Auto Scaling Groups (ASG) and Elastic Load Balancing (ELB) to connect to instances, and internal DNS using Amazon Route 53 for service resolution.

Hostnames

Traditionally you would see hostnames similar to:

backend-1.backend
frontend-1.frontend
puppetmaster-1.management

Hostnames are now automatically generated by DHCP, and refer to the IP address and region that the instance belongs to:

ip-10-1-4-100.eu-west-1.compute.internal

Please see the documentation about accessing the environment.

Service resolution

Traditionally resolving a service name to an IP would be handled by hardcoding names and IPs in /etc/hosts.

To make use of the dynamic environment in AWS, we are using Amazon Route 53 to resolve service names to their appropriate ELB. Each node group (a set of instances within an autoscaling group) will resolve a main service name, along with any application service names that belong to that group. For example, the calculators-frontend node group, will resolve calculators-frontend as the service name:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host calculators-frontend
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.27
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.5.238

It will also resolve for an application service name, such as Calendars:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host calendars
calendars.integration.govuk-internal.digital is an alias for calculators-frontend.integration.govuk-internal.digital.
calculators-frontend.integration.govuk-internal.digital is an alias for calculators-frontend.blue.integration.govuk-internal.digital.
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.5.238
calculators-frontend.blue.integration.govuk-internal.digital has address 10.1.6.27

The service name will first resolve the top level environment domain name (integration.govuk-internal.digital), which will be a CNAME record to a stack specific DNS record. Please see the documentation about the concept of stacks in the infrastructure.

GOV.UK applications use Plek for service discovery. Plek will return the fully-qualified domain name (FQDN) of the service it is discovering.

irb(main):001:0> Plek.find("publishing-api")
=> "https://publishing-api.integration.govuk-internal.digital"

This will resolve to the associated ELB:

lauramartin@ec2-integration-blue-backend-ip-10-1-5-53:~$ host publishing-api.integration.govuk-internal.digital
publishing-api.integration.govuk-internal.digital is an alias for publishing-api.blue.integration.govuk-internal.digital.
publishing-api.blue.integration.govuk-internal.digital has address 10.1.4.215
publishing-api.blue.integration.govuk-internal.digital has address 10.1.5.50

No internal services should be accessed using the external public load balancers from within the internal network.

We are unable to set the internal domain as the default because some applications do self-referred Plek lookups that affect how applications are presented to the user. We have determined it is safer to set specific overrides for services until this behaviour is changed within the applications.

Please see the related ADR for DNS Infrastructure for further detail.

Databases

PostgreSQL and MySQL

We are using Amazon Relational Database Service (RDS) to host PostgreSQL and MySQL databases.

To run Puppet against these databases, we have a new instance class: db_admin

Both PostgreSQL and MySQL databases are managed through this instance. They are also used to take textual backups, which are then stored in an Amazon S3 bucket.

Transition has its own class for management: transition_db_admin

Please see the documentation about administering RDS databases.

Redis

We are using Amazon Elasticache instead of managing our own Redis instances.

Architecture changes

Removal of LB tiers

Due to the use of Elastic Load Balancing we no longer have a need to maintain our own NGINX load balancers, and so these have been removed from the stack. See the related ADR for further details.

Merging MySQL databases

Traditionally we had a separate MySQL server for Whitehall. Rather than manage multiple RDS instances, we have merged this into the main MySQL server. See the relevant ADR for details.

Deploying infrastructure

Terraform is used to manage our Infrastructure as Code. This code is stored in govuk-aws.

See the documentation to make and deploy changes to the infrastructure.

Automated application deployments

If an EC2 instance is terminated, it will be automatically rebuilt by the autoscaling group. If an instance runs deployable applications, it will automatically start a deployment of the applications it runs using the Deploy_App Jenkins job. It deploys the deployed-to-<environment> branch, and runs the deploy:without_migrations task.

If an instance is having issues, terminating the instance may be the quickest way of ensuring a clean redeploy the applications.

Be aware of instances that run a lot of applications that this may block ongoing deployments due to the time it takes to deploy multiple applications.

For a list of what applications run on which instance types, see the relevant hieradata.

This page was set to be reviewed before 2018-04-22 by the page owner: #govuk-2ndline. This might mean the content is out of date. Read how to review a page.