Nagios NRPE connection failures
Nagios uses a protocol called NRPE (Nagios Remote Plugin Executor) to perform checks on remote machines. Monitored machines run an ‘NRPE agent’ which listens for requests to execute monitoring checks.
Occasionally the Nagios server is unable to connect to an NRPE agent on one of the monitored machines. In this case you will see an alert with a message such as 'connection to NRPE could not be established’.
In some cases this may be a false alarm due to load on the monitored machine. In this case the NRPE errors usually disappear of their own accord when Nagios tries the check again (within a minute or two). If the errors persist for longer than a few minutes there may be a genuine issue and you should investigate.
Note: This example uses hostnames for Carrenza, and will not work on AWS.
For AWS you’ll need to use the hosts from
More information on hostnames in different environments is available here:
First verify that NRPE is running on the monitored machine:
$ ssh broken-machine-1.broken.staging $ nc -v localhost 5666 Connection to localhost 5666 port [tcp/nrpe] succeeded!
If the connection succeeds then the NRPE agent is running on that machine.
A failure indicates that the agent is not running, and you should
investigate. Try running
govuk_puppet --test on the machine, which should
restart the service. If it fails check the output for errors.
If the agent is running ok, next check that you can connect from the monitoring server:
$ ssh ssh monitoring-1.management.staging $ nc -v broken-machine-1.broken 5666 Connection to localhost 5666 port [tcp/nrpe] succeeded!
A failure indicates a networking issue between the monitoring server and the monitored machine.
Note that the NRPE port is firewalled and only accessible from the monitoring machine and the box itself.
More about Monitoring
- Add a deployment dashboard for an application
- Add an Icinga passive check to a Jenkins job
- Add sidekiq-monitoring to your application
- Error reporting with Sentry
- GOV.UK and Virtual Private Networks (VPNs)
- Graphite and deployment dashboards
- How to deal with errors
- Monitor Sidekiq queues for your application
- Monitoring screens
- Pingdom Bouncer canary check
- Tools: Icinga, Grafana and Graphite, Kibana and Fabric
- Uptime Metrics
- Use AWS X-Ray to trace app requests
- Use Terraboard to monitor Terraform state