Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Icinga alerts

MySQL: replication lag

Checks the value of Seconds_Behind_Master to a threshold. As described in the MySQL documentation:

Seconds_Behind_Master: The number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion.

This is unable to reliably detect when replication is completely stopped or broken. In such an event, it will return a NULL value and raise an UNKNOWN alert. This should correlate with a CRITICAL alert for ‘mysql replication running’.

If Seconds_Behind_Master shows as NULL, you may be able to fix replication by running the mysql.reset_slave Fabric task:

fab $environment -H <mysql_slave_hostname> mysql.reset_slave

An alert for 'MySQL replication lag’ on mysql-backup-* machines may be caused by a database backup or a Jenkins data synchronisation job, such as:

https://deploy.publishing.service.gov.uk/job/Copy_Data_to_Staging/

You can verify this by checking for a running mysqldump process on the affected host, e.g.:

ps auxwww | grep mysqldump

In such cases, the alert should return to normal once the backup completes.

This page was last reviewed on 17 April 2019. It needs to be reviewed again on 17 October 2019 by the page owner #govuk-2ndline .
This page was set to be reviewed before 17 October 2019 by the page owner #govuk-2ndline. This might mean the content is out of date.