How logging works on GOV.UK
For information on how to log in and view stacks, please see the GOV.UK Logit documentation.
Elasticsearch in AWS uses a managed service. Logs are exported to AWS Cloudwatch and retained for 3 days.
Logs are also written to a S3 bucket which is used to import the logs into Logit.
Each machine runs Elastic Filebeat, and independently ships logs to the Logit-provided logstash endpoint.
Filebeat tails logs every 10 seconds and can output to a variety of sources. It is fully incorporated into the Elastic ecosystem.
We use the
filebeat::prospector defined type to create
the filebeat configuration on each instance.
Logstream and Logship
We have a defined type in our Puppet code which uses logship to tail logfiles. We only use Logstream to send nginx metrics, via statsd, to Graphite.
In the future this will be replaced.
Kibana is the interface for viewing logs in Elasticsearch. Use the Logit interface to login to Kibana.
There’s some documentation on useful Kibana queries for 2nd line.
Fastly sends logs to multiple locations for the www, assets and bouncer services:
- via syslog to the monitoring-1 boxes in all environments (
/var/log/cdn), available immediately
- to an S3 bucket per environment, available every 10 minutes
Data from statsd goes to Graphite instances which is then displayed using Grafana.
Analytics through Athena
This documentation is adapted from the Email Alert API analytics documentation which also uses Athena
Athena is Amazon’s service for querying files in S3 using an SQL-like syntax. The logs bucket is set up to be crawled by AWS Glue every day in order to discover new partitions and configure the schema correctly.
Athena is accessible through the AWS control panel by following the
instructions for accessing the AWS console.
To access the production data you will need to use the
govuk-infrastructure-production account, once there you can head to
athena and select the
Always query with partitions
You should always query with a where condition which defines the partitions
to be used in your result set e.g.
WHERE year=2018 AND month=7 AND date=4
unless you are sure you need a wider data range.
The data is stored in directories which separate the data by year, month and
date values. By applying a partition to the query, such as
WHERE year=2018 AND
month=7 AND date=4 you reduce the data needed to be traversed in the query
to just the files from that single day. Which naturally makes the query
perform substantially quicker.
Each query against Athena has a monetary cost - at time of writing $5 per TB of data scanned - and by using partitions you massively reduce the data that needs to be scanned.
Number of requests per TLS version
SELECT "tls_client_protocol", count(*) FROM "fastly_logs"."govuk_www" WHERE "year" = 2018 AND "month" = 8 AND "date" = 5 GROUP BY "tls_client_protocol";
Number of errors returned during an incident
SELECT "status", count(*) FROM "fastly_logs"."govuk_www" WHERE "year" = 2018 AND "month" = 8 AND "date" = 5 AND "request_received" BETWEEN timestamp '2018-08-05 10:00:00' AND timestamp '2018-08-05 11:00:00' GROUP BY "status";