Table of contents
This page describes what to do in case of an Icinga alert. For more information you could search the govuk-puppet repo for the source of the alert

Icinga alerts

Prolonged GC collection times

This checks when the Elasticsearch JVM garbage collection times (in milliseconds) exceeds critical or warning levels. This is collected by graphite via collectd from the Elasticsearch API.

Currently the check uses graphite function to summarise over a time period of 5 minutes and find the maximum value in that period.

You can find the current value using curl if you create a tunnel to Elasticsearch then use curl to query the stats API:

$ ssh -At jumpbox.production.govuk.digital -L 9200:localhost:9200 "ssh -q \`govuk_node_list --single-node -c search\` -L 9200:elasticsearch5.blue.production.govuk-internal.digital:80"
$ curl localhost:9200/_nodes/stats/jvm?pretty

You need to look for the collection_time_in_millis. There will be two values: old and young. Both are checked by Nagios and correspond to different portions of the JVM heap. The lower these times are, the better. Another important value is heap_used_percent, again this should be low. If it gets too high it may prevent garbage collection completing.

Solutions

GOV.UK has a support contract with AWS for the managed Elasticsearch service. See ‘Fixing issues with the cluster’ for further information.

This page was last reviewed on 27 March 2019. It needs to be reviewed again on 27 September 2019 by the page owner #govuk-2ndline .
This page was set to be reviewed before 27 September 2019 by the page owner #govuk-2ndline. This might mean the content is out of date.