Skip to main content
Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert
Last updated: 17 Nov 2020

GoReplay

GoReplay (previously "gor") is an open source tool we use to replay HTTP traffic from production to staging to give us greater confidence that our deploys are ok.

When users make GET, HEAD, or OPTIONS requests on production, the request automatically gets played out on staging in real time. This 'shadowing' technique acts as a load test, as well as presenting our applications with real user queries we may not have tested for. Any errors should manifest themselves in Sentry or as 5xx in Grafana. We check for these errors before deploying to production, as per our deploy process.

Read the blog post Putting the Router through its paces for a history of GoReplay on GOV.UK.

GoReplay during the data sync

When the data sync is in progress, GoReplay is temporarily disabled to prevent lots of errors while we are dropping databases.

Puppet will remove GoReplay alerts while the data sync runs, but you may see the alerts at the beginning of a data sync, before Puppet has had time to remove them.

If you see errors in production with the status of PROCS CRITICAL: 0 processes with command name 'gor', this is expected until the data sync is complete.

Data sync process failed

In case the data sync process aborts, GoReplay might not be restarted in a proper way.

If that's the case, make sure that the following file exists on the host:

/etc/govuk/env.d/FACTER_data_sync_in_progress

and that it is in a proper state (i.e. empty).

If not, restart the GoReplay processes with the following Fabric command:

fab $environment puppet_class:gor sdo:'rm /etc/govuk/env.d/FACTER_data_sync_in_progress' app.start:goreplay

This will remove the file and restart GoReplay from all hosts running it.

When Puppet runs again in those hosts, it re-creates the alerts and sees them back in icinga.