Skip to main content
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert
Warning This document has not been updated for a while now. It may be out of date.
Last updated: 8 Sep 2020

RabbitMQ: No consumers listening to queue

Check that there is at least one non-idle consumer for rabbitmq queue {queue_name}

Icinga connects to RabbitMQ’s admin API to check on the activity of the consumers and that at least one consumer is running for a given RabbitMQ message queue. See here for the plugin that implements the alert.

For information about how we use RabbitMQ, see here.

No consumers listening to queue

This check reports a critical error when no consumers are listening to the queue, meaning messages entering the queue will never be processed.

No activity for X seconds: idle times are [X, Y, Z]

This checks whether the RabbitMQ consumers for a queue have been active in the last 5 minutes. Consumers in an idle state are listening to the queue but are unable to process the messages on it.

Publishing API sends a heartbeat message every minute to the following queues - email-alert-service, cache_clearing_service-high and content_data_api. This is configured via the queues bindings (e.g email-alert-service’s binding) matching the routing key used in the heartbeat. This should ensure that the consumers are never idle for these queues.

Note

You may see the high unprocessed messages alert too, as issues with consumers processing messages could then lead to a high backlog of messages.

Troubleshooting

Check the RabbitMQ logs

Check the logs for the rabbitmq application which can be achieved logging into Kibana and searching for application: rabbitmq. Is there evidence of any errors?

Check the RabbitmQ Grafana dashboard

This Grafana dashboard shows activity across multiple exchanges and queues. The main exchange we expect to be monitoring is published_documents which handles broadcasting to services such as search and email-alert-service when content changes across GOV.UK.

Looking at the queue graphs we should look out for the following:

  • Check for high ‘ready’ messages for the queue - indicates these messages are waiting to be processed in RabbitMQ by the consumer (e.g email-alert-service).

  • Check for high ‘unacknowledged’ messages for the queue - implies that messages have been read by the consumer but the consumer has never sent back an ACK to the RabbitMQ broker to say that it has finished processing.

  • Check for high ‘redeliver’ rate for the queue - in the event of network failure (or a node failure), messages can be redelivered. An example is if the consumer dies (its channel is closed, connection is closed, or TCP connection is lost).

If we’re seeing high ‘redeliver’ rates, high ‘ready’ or ‘unacknowledged’ messages then this could indicate an issue with the consumer.

Troubleshooting steps

  1. You could try restarting the consumer applications. After restarting, check to see if the problem is solved. E.g for email-alert-service, run:
$ fab $environment class:email_alert_api app.restart:email-alert-service
  1. If the issue has not resolved, we should check in the consumers application logs to see if any errors are being thrown.