GoReplay (Traffic replay)
GoReplay (previously “gor”) is an open source tool we use to replay HTTP traffic from production to staging to give us greater confidence that our deploys are ok.
When users make GET
, HEAD
, or OPTIONS
requests on production,
the request automatically gets played out on staging in real time. This ‘shadowing’
technique acts as a load test, as well as presenting our applications with real user
queries we may not have tested for. Any errors should manifest themselves in Sentry or
as 5xx in Grafana. We check for these errors before deploying to production, as per
our deploy process.
Read the blog post Putting the Router through its paces for a history of GoReplay on GOV.UK.
GoReplay during the data sync
When the data sync is in progress, GoReplay is temporarily disabled to prevent lots of errors while we are dropping databases.
Puppet will remove GoReplay alerts while the data sync runs, but you may see the alerts at the beginning of a data sync, before Puppet has had time to remove them.
If you see errors in production with the status of
PROCS CRITICAL: 0 processes with command name 'gor'
, this is expected until the
data sync is complete.
Data sync process failed
In case the data sync process aborts, GoReplay might not be restarted in a proper way.
If that’s the case, make sure that the following file exists on the host:
/etc/govuk/env.d/FACTER_data_sync_in_progress
and that it is in a proper state (i.e. empty).
If not, restart the GoReplay processes:
sudo rm /etc/govuk/env.d/FACTER_data_sync_in_progress
sudo service goreplay start
This will remove the file and restart GoReplay from all hosts running it.
When Puppet runs again in those hosts, it re-creates the alerts and sees them back in icinga.