Grafana
Grafana is an open-source visualisation tool. It does not store data, but consumes data sources to create real-time graphs displayed on custom dashboards. Data sources include Prometheus, Graphite, Logit and CloudWatch. The query language of the data store, such as PromQL for Prometheus, is used to construct the graphs.
Grafana dashboards
Useful Grafana dashboards:
The full list of Grafana dashboards is stored in the Puppet repo. For details on how to create a new dashboard, read the Grafana dashboards alert documentation.
Grafana tips
You can use regexes to filter for relevant information. For example,
*frontend*
on the processes dashboard
to see all processes that have ‘frontend’ in them.
We often show multiple metrics on the same graph. The position of the key shows which Y-axis each metric corresponds to:
You can click on a metric in a graph to show only that metric, or you
can CMD + click
to select multiple:
Annotations on charts show events such as deploys:
For more tips, see the Introduction to Grafana slides.
Fixing N/A in dashboards
When a request for data times out, Grafana will render an “N/A” in the panel. Usually refreshing the page or choosing a shorter time range fixes the issue.
If a dashboard consistently returns “N/A”, then there may be an underlying issue.
In the failing panel, open Query Inspector, and read the error message for clues:
If you see the following error:
raise CorruptWhisperFile("Unable to read header", fh.name) CorruptWhisperFile: Unable to read header (/opt/graphite/storage/whisper/stats/govuk/app/collections-publisher/ip-10-1-5-36/errors_occurred.wsp)
…that suggests the disk was full at the time of writing to Graphite. The solution is to remove the corrupt file, and ensure there is space on the disk.
SSH into the relevant machine and more errors_occurred.wsp
to see the file contents,
or ls -lsa
in the directory to see the file sizes. This should confirm a file size of
zero.
Delete all empty (corrupt) WSP files with:
sudo find /opt/graphite/storage/whisper/ -type f -empty -delete
You should now find the dashboard panels load properly.