Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Gor

Gor is an open source tool for replaying HTTP traffic. We use it to replay traffic from production to staging to give us greater confidence that our deploys are ok.

Alerts for Gor might let you know that it’s not running, which means we have to be much more cautious with deploys.

The nightly data sync stops Gor while data is syncing, so that we don’t get lots of errors in staging while we’re dropping databases.

Puppet will remove these alerts while the data sync runs but you may see the alerts at the beginning of a data sync, before Puppet has had time to remove them.

Data sync process failed

In case the data sync process aborts, gor might not be restarted in a proper way. If that’s the case, make sure that the following file exists on the host:

/etc/govuk/env.d/FACTER_data_sync_in_progress

And that it is in a proper state (i.e. empty). If not, you can restart the gor processes with the following Fabric command:

fab $environment puppet_class:gor sdo:'rm /etc/govuk/env.d/FACTER_data_sync_in_progress' app.start:gor

This will remove the file and restart gor from all hosts running it.

When Puppet runs again in those hosts, it will re-create the alerts and we will be able to see them back in icinga.

gor running critical errors in production

When a data sync job is in progress, you may see errors in production with the status of PROCS CRITICAL: 0 processes with command name 'gor', this is expected and you can check the progress the job in jenkins.

More about Icinga alerts

This page is owned by #2ndline and needs to be reviewed