When users make
OPTIONS requests on production,
the request automatically gets played out on staging in real time. This 'shadowing'
technique acts as a load test, as well as presenting our applications with real user
queries we may not have tested for. Any errors should manifest themselves in Sentry or
as 5xx in Grafana. We check for these errors before deploying to production, as per
our deploy process.
Read the blog post Putting the Router through its paces for a history of GoReplay on GOV.UK.
GoReplay during the data sync
Puppet will remove GoReplay alerts while the data sync runs, but you may see the alerts at the beginning of a data sync, before Puppet has had time to remove them.
If you see errors in production with the status of
PROCS CRITICAL: 0 processes with command name 'gor', this is expected until the
data sync is complete.
Data sync process failed
In case the data sync process aborts, GoReplay might not be restarted in a proper way.
If that's the case, make sure that the following file exists on the host:
and that it is in a proper state (i.e. empty).
If not, restart the GoReplay processes with the following Fabric command:
fab $environment puppet_class:gor sdo:'rm /etc/govuk/env.d/FACTER_data_sync_in_progress' app.start:goreplay
This will remove the file and restart GoReplay from all hosts running it.
When Puppet runs again in those hosts, it re-creates the alerts and sees them back in icinga.