How we test GOV.UK

Warning This document has not been updated for a while now. It may be out of date.

Last updated: 8 Jul 2024

GOV.UK has several layers of testing:

Continuous Deployment checks
- Smokey
- Application Healthchecks
Continuous Integration checks
- Contract Tests
- Unit, Integration, etc. Tests

Recommended reading: A new standard of testing for GOV.UK.

This manual is about how we currently test GOV.UK. We will never have “perfect” tests but we can also do better than what we have now e.g. by

Having more types of tests that run before a change is merged, so we can learn about issues earlier in the development process.
Investing in other ways to test changes, so that we are less relient on expensive end-to-end tests for things like CDN config (tech debt).

Continuous Deployment checks

Smokey

Smokey is the smoke test suite for GOV.UK.

Smoke tests are meant to be “probes”: their purpose is to monitor real environments for transient failures. Conversely, a “test” should be run in a temporary, isolated environment. We use Smokey as a suite of probes and a suite of “surrogate” end-to-end tests:

Probes. A full run of Smokey is triggered every few minutes in every environment, causing a Slack alert if it fails. This should prompt an engineer to go and fix a problem. However, many of the probes are unreliable so they are not currently used to page people.
Surrogate Tests. A subset of the probes are run as part of the Continuous Deployment pipeline. These can fail for reasons unrelated to the app being targeted, which is why we consider this a “surrogate” form of testing.

We use Sentry to monitor flakey probes and identify patterns we can fix.

Application Healthchecks

Healthchecks are a fast way to check an app is running at the end of a deployment. They can also perform detailed checks on the infrastructure the app needs to run.

Web apps have a /healthcheck/ready endpoint, which directly checks if the app can serve requests and connect to its infra e.g. a database.

Apps that are automatically deployed all the way to Production should follow the guidance on healthchecks (“safety checks”) in the Continuous Deployment RFC:

The healthcheck covers connectivity to all systems the app uses to read or write data e.g. databases, remote file systems, remote caches.

Note that service apps have no web server to query, so we check the process is running; apps should fail to start if they can’t connect to their infra (example).

Continuous Integration checks

Contract Tests

Also known as “Pact Tests”.

Contract tests check that APIs exposed by one app (the “provider”) are compatible with other apps that use those APIs (“consumers”). This is done by having a set of shared tests: the tests are run by the provider and the consumers.

Apps that are automatically deployed all the way to Production should follow the guidance on tests (“safety checks”) in the Continuous Deployment RFC:

Each endpoint with multiple, internal consumer apps has at least one contract test.

Contract tests are an important part of the overall test strategy for GOV.UK. If we visually add unit and integration tests, the result is a “chain” of test coverage:

Consumer (Unit tests) <---> API (Contract Test) <---> Provider (Unit tests) <---> ...

The chain can be brittle, though: it can’t test incremental state changes across multiple apps - think about all the API calls and state changes involved in publishing a document on GOV.UK. End-to-end tests are an alternative way of checking for this kind of end-to-end behaviour.

Unit, Integration, etc. Tests

Most GOV.UK apps are built with Ruby on Rails and you should use specific tools and strategies for testing them. Some older apps use the Minitest framework as they were written prior to us adopting RSpec; we have migrated some apps to RSpec (example) but this should be avoided due to the effort required.

Apps that are automatically deployed all the way to Production should also follow the guidance on tests (“safety checks”) in the Continuous Deployment RFC:

It has at least one JavaScript test, if it makes use of the language.
Its code coverage exceeds 95%.

We use GitHub Actions to run these tests.