All the repositories involved in transition have been tagged with govuk-transition on GitHub.
High level overview
Source diagram in the GOV.UK architecture folder.
Transition data sources
Pre-transition traffic data is imported from pre-transition-stats, based on logs provided by transitioning organisations. This was updated periodically by hand, but this has come to an end.
Traffic data is automatically imported every hour from transition-stats. This
import puts a high load on the database. CDN logs for the “Production Bouncer”
Fastly service are streamed to
logs-cdn.publishing.service.gov.uk (which is
logs-cdn-1.management in Production), processed there by cron job and
pushed to the GitHub repository.
logs-cdn-1.management, both log files and cache files that are produced by the
processing script are rotated. These files should be compressed and archived and
Bouncer is a Ruby/Rack web app that receives requests for the URLs of government
sites that have either been transitioned to GOV.UK, archived or removed. It queries
the database it shares with Transition and replies with a redirect, an archive page
or a 404 page. It also handles
Transition is a Rails app that allows users in transitioning organisations and at GDS to view, add and edit the mappings used by Bouncer. It also presents traffic data sourced from CDN logs and logs provided by transitioning organisations (though this latter activity has now ended).
When sites transition they are generally CNAMEd to a domain we control that points to our CDN (an A record is used for root domains which can’t be CNAMEd).
Some sites partially transition, which means that they redirect some paths to their AKA domain, which is CNAMEd to us.
GDS doesn’t control the DNS for most transitioned domains, except for some domains such as
*.alphagov.co.uk. We are working on providing a more exhaustive list. If the DNS
for a particular transitioned site isn’t configured correctly we need to inform
the responsible department so they can fix it themselves.
Bouncer has a separate CDN service at Fastly (“Production Bouncer”) from the main GOV.UK one, and it’s configured by a separate Jenkins job which adds and removes domains to and from the service. That job fetches the list of domains which should be configured at the CDN from Transition’s hosts API, so will fail if that is unavailable.
Bouncer runs on 3 machines in the
redirector vDC (
and they are load-balanced at the vShield Edge rather than by a separate machine.
Bouncer’s traffic does not go through the
cache-* nodes - the CDN proxies all
bouncer.publishing.service.gov.uk which points to its vShield Edge.
It uses an Nginx default vhost so that requests for all domains are passed on to the application; there’s generally no Nginx configuration for individual transitioned sites (but see Special cases below).
In the case of a data centre failure, within the disaster recovery (DR) vCloud organisation we have:
- Bouncer application servers which read from the DR database slave
- a second PostgreSQL slave for the Transition database
Bouncer is a small application, and so long as its dependencies are present the only thing to do if it’s erroring is to restart it.
Bouncer reads from the
transition_production database by connecting to
transition-postgresql-slave-1.backend. It authenticates using its own
postgresql role which is granted
SELECT permissions on all tables
and that role is further restricted to connecting only to the slave because the
to allow it isn’t present on the master.
- We reverse-proxy requests for some paths on www.mhra.gov.uk to the old site because some tools had not yet been redeveloped when they transitioned and they needed to continue to be served; their site is often slow to respond and may time out. This proxying is handled by Nginx so these requests are not routed to Bouncer.
- We serve some assets which were previously on directgov and businesslink via Nginx on the Bouncer machines. The assets live in two repos which are fetched and rsynced to the machines when Bouncer is deployed.