Grafana is an open-source visualisation tool. It does not store data, but consumes data sources to create real-time graphs displayed on custom dashboards. Data sources include Prometheus, Graphite, Logit and CloudWatch. The query language of the data store, such as PromQL for Prometheus, is used to construct the graphs.
Useful Grafana dashboards:
You can use regexes to filter for relevant information. For example,
*frontend* on the processes dashboard
to see all processes that have 'frontend' in them.
We often show multiple metrics on the same graph. The position of the key shows which Y-axis each metric corresponds to:
You can click on a metric in a graph to show only that metric, or you
CMD + click to select multiple:
Annotations on charts show events such as deploys:
For more tips, see the Introduction to Grafana slides.
Fixing N/A in dashboards
When a request for data times out, Grafana will render an "N/A" in the panel. Usually refreshing the page or choosing a shorter time range fixes the issue.
If a dashboard consistently returns "N/A", then there may be an underlying issue.
In the failing panel, open Query Inspector, and read the error message for clues:
If you see the following error:
raise CorruptWhisperFile("Unable to read header", fh.name) CorruptWhisperFile: Unable to read header (/opt/graphite/storage/whisper/stats/govuk/app/collections-publisher/ip-10-1-5-36/errors_occurred.wsp)
…that suggests the disk was full at the time of writing to Graphite. The solution is to remove the corrupt file, and ensure there is space on the disk.
SSH into the relevant machine and
more errors_occurred.wsp to see the file contents,
ls -lsa in the directory to see the file sizes. This should confirm a file size of
Delete all empty (corrupt) WSP files with:
sudo find /opt/graphite/storage/whisper/ -type f -empty -delete
You should now find the dashboard panels load properly.