Our content delivery network (CDN)
GOV.UK uses Fastly as a CDN. Citizen users aren’t accessing GOV.UK servers directly, they connect via the CDN. This is better because:
- The CDN “edge nodes” (webservers) are closer to end users. Fastly has servers all around the world but our “origin” servers are only in the UK.
- It reduces load on our origin. Fastly uses Varnish to cache responses.
The CDN is responsible for retrying requests against the static mirror.
Most of the CDN config is versioned and scripted:
Some configuration isn’t scripted, such as logging. The www, bouncer and assets
services sends logs to S3 and stream them to
monitoring-1. These logging
endpoints are configured directly in the Fastly UI. There is
documentation on how to query the CDN logs.
The main www.gov.uk cache is Varnish, which Fastly run for us.
Varnish lets us configure our caching logic with VCL (Varnish config language).
It also lets us do fancy things, like only allowing connections to staging from permitted IPs, forcing SSL and blocking IP addresses, among other things.
We set a default TTL of 5000s on cached objects. This means that pages such as the GOV.UK homepage will be cached for 83 mins. 5XX responses get cached for 1s; mirror responses get cached for 15 minutes.
We also set a grace period of 24 hours. So if the homepage server is down, we’ll continue to serve a stale homepage for 24 hours.
We will cache any non-GET/HEAD request that returns a 404 or 405 status for the default TTL. This means (for example) that a POST request that returns a 405 (Method Not Allowed) will be cached.
These are the GET request status codes that Varnish caches automatically: 200, 203, 300, 301, 302, 404 or 410. See the Varnish docs for more detail. We have added to these: see the repo VCL for special handling of certain status codes, and for the most up-to-date version of what we’re running in Fastly. Refer to the Varnish 2.1 documentation when looking at the VCL code.
VCL can be tricky to get right. When making changes to the VCL, add smoke tests to smokey and check that they don’t fail in staging.
You can also use Fastly’s Fiddle tool to manually test, and you can also test your changes with cURL by including a debug header:
curl -svo /dev/null -H "Fastly-Debug:1" https://www.gov.uk
This will give you various debugging headers that may be useful:
< Fastly-Debug-Path: <nodes you hit> < Fastly-Debug-TTL: <nodes with TTL> < Fastly-Debug-Digest: <hash> < X-Served-By: <node that responded> < X-Cache: HIT, HIT < X-Cache-Hits: 1 < X-Timer: <time it took> < Vary: Accept-Encoding, Accept-Encoding
See the Varnish/Fastly docs for what these mean. Check out the Fastly debugging guide for more details on testing.
Fastly’s IP ranges
Fastly publish their cache node IP address ranges as JSON from their API. We use these IP addresses in 2 places:
- Origin has firewall rules in place so that only our office and Fastly can connect.
- Our Fastly Varnish config restricts HTTP purges to specific IP addresses (otherwise anyone would be able to purge the cache).
Banning IP addresses at the CDN edge
We occasionally decide to ban an IP address at our CDN edge if they exhibit the following behaviour:
- not respecting our robots.txt directives
- repeatedly receiving 429 (rate limit) error responses from origin and not slowing down
- making suspicious requests like attempting SQL injection queries
Banning IPs shouldn’t be taken lightly as IP address can be shared my multiple user devices and the user behind an IP address can change over time, so there’s always a chance that we may block a legitimate user when we ban IP addresses.
Bouncer’s Fastly service
A Fastly CDN service can normally handle up to 1000 domains (this limit was undocumented).
We have asked them to increase this limit for Bouncer’s service a few times as the number of domains it handled grew, and the limit is currently 3500. We have about 2000 domains so shouldn’t need to increase it again for a while.
If we reach the limit then the Jenkins job to update Bouncer’s CDN config should fail and new domains won’t be added to the service.
Configuring a new site in Transition generally adds at least 4 domains
to the service, including the
aka domain for each real domain. For
New solution for Bouncer and Fastly
Fastly’s new solution to get around the domain limit is a “service pinned map”.
They have created a map which we access using
Domains that need to be transitioned can
CNAME to this domain. It
also has 4 IP addresses assigned, which at the time of writing are
the same as the
A records at that hostname:
Domains do not need to be added to the “Production Bouncer” Fastly service like they used to be.