Skip to main content
Table of contents

Monitoring

Tools: Icinga, Grafana and Graphite, Kibana and Fabric

Icinga

Icinga is used to monitor alerts that we have set up.

Grafana

Grafana lets us create nice dashboards using data from Graphite, Elasticsearch (Logit) and Cloudwatch.

Graphite

https://graphite.publishing.service.gov.uk/

Graphite is a graphing tool that allows us to draw graphs of various metrics that we put into it. Graphite has two main views: a composer to build individual graphs and a dashboard to put multiple graphs together.

We are currently locked at version 0.9.13.

To build a graph, you can add one or more graph targets in the composer by either clicking on them in the left frame. Some useful targets are:

  • stats.cache-?_router.nginx_logs.www-origin.http_5xx to graph the rate of HTTP errors from all cache machines (note the question mark to pattern-match multiple data series: * also works).
  • stats.backend-?_backend.nginx_logs.content-store_publishing_service_gov_uk.http_5xx to show HTTP errors for a specific app on all backend machines.

The composer offers tab completion, although it doesn’t handle patterns very well.

To add one of these graphs to a dashboard, you can copy the graph image URL and select Graphs → New Graph → From URL from the dashboard menu.

Both Graphite views let you adjust the time range of graphs, although they both do it in different ways. The composer view offers two buttons to select absolute and relative time ranges (composer_buttons), and the dashboard view has ones with labels (dashboard_buttons).

Our deployment dashboards use Graphite extensively. See some tips on how to best manipulate the data streams to create useful dashboards.

Applying Functions

Apply Graphite functions to your data to make it more useful.

One particularly useful Graphite function is keepLastValue. If your graphs come out nearly black with a few spots of colour in them, you probably want this one. Both views have an “Apply Function” button.

Kibana

Kibana is a log viewer and search engine. Access GOV.UK Kibana through Logit.

In Kibana, you can filter down log messages to show you just the ones you want. Say you’ve spotted a large number of errors coming from the content store related to MongoDB connections, and you want to find out whether the MongoDB logs show anything strange.

You can narrow down which log messages you want using the column browser on the left: @source_host and application are some particularly useful ones. The magnifying glass symbol next to each value lets you build up a query string and tinker with it.

You can tweak the time range manually with the drop down at the top or by dragging on the timeline.

Check out some of the useful Kibana queries to get an idea of what’s possible.

Logs are sent to Kibana using Filebeat.

Fabric Scripts

https://github.com/alphagov/fabric-scripts/

The Fabric scripts are useful for running something on a set of machines. For instance, to restart all instances of the content store on backend boxes:

fab $environment class:backend app.reload:content-store

Check the app.py class for different methods you can use. To run more specific commands you can run the following (sdo for sudo):

fab $environment class:backend sdo:"service content-store reload"

For more information, check out the Fabric scripts README.

On the blog

This page was last reviewed on 25 April 2019. It needs to be reviewed again on 25 October 2019 by the page owner #govuk-2ndline .
This page was set to be reviewed before 25 October 2019 by the page owner #govuk-2ndline. This might mean the content is out of date.