Gor is an open source tool for replaying HTTP traffic. We use it to replay traffic from production to staging to give us greater confidence that our deploys are ok.
Alerts for Gor might let you know that it’s not running, which means we have to be much more cautious with deploys.
The nightly data sync stops Gor while data is syncing, so that we don’t get lots of errors in staging while we’re dropping databases.
Puppet will remove these alerts while the data sync runs but you may see the alerts at the beginning of a data sync, before Puppet has had time to remove them.
Data sync process failed
In case the data sync process aborts, gor might not be restarted in a proper way. If that’s the case, make sure that the following file exists on the host:
And that it is in a proper state (i.e. empty). If not, you can restart the gor processes with the following Fabric command:
fab $environment puppet_class:gor sdo:'rm /etc/govuk/env.d/FACTER_data_sync_in_progress' app.start:gor
This will remove the file and restart gor from all hosts running it.
When Puppet runs again in those hosts, it will re-create the alerts and we will be able to see them back in icinga.
gor running critical errors in production
When a data sync job is in progress, you may see errors in production with the status of
PROCS CRITICAL: 0 processes with command name 'gor', this is expected and you can check the progress the job in jenkins.