Skip to main content
Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Icinga alerts

reboot required by apt

This check indicates that some new packages have been installed but require rebooting the machines to become active.

The check contains a list of hosts that require a reboot. Click into the warning for the list.

Under normal circumstances most machines reboot automatically and the list shows those that need to be rebooted manually, such as database clusters. If the list does not correlate with this there may be a problem with the locking mechanism.

When on 2nd Line

This alert is a common occurrence in production and staging environments. It’s unlikely to occur in integration because the machines are powered on each day.

Typically you can manage this alert with the following steps:

In staging

It is acceptable for most, if not all, machines to be rebooted in staging during the working day. However, data loss can occur so apply caution.

Work through the documentation on rebooting machines, following the procedures particular to each machine.

In production

In production, you are normally dealing with a variety of hosts. Some machines, like MongoDB and Elasticsearch, can be rebooted with caution. Others, like MySQL and PostgreSQL, require out-of-hours reboots by On Call staff.

Work through the guide on rebooting machines to safely reboot the machines that can be, and kindly ask On Call staff to schedule out-of-hours reboots for the other machines.

Checking locking status

locksmith manages unattended reboots to ensure that systems are available. It is possible that a problem could occur where they can’t reboot automatically. The following commands assume you have correctly set up your fabric scripts.

$ fab <environment> all locksmith.status

If a lock is in place, it will detail which machine holds the lock.

You can remove it with:

$ fab <environment> -H <machine-name> locksmith.unlock:"<machine-name>"

Machines that are safe to reboot should then do so at the scheduled time.

This page was last reviewed on 10 July 2019. It needs to be reviewed again on 10 January 2020 by the page owner #govuk-2ndline .
This page was set to be reviewed before 10 January 2020 by the page owner #govuk-2ndline. This might mean the content is out of date.