Many of our applications use Sidekiq for background job processing.
There’s a GOV.UK wrapper that will help you set it up.
There are two approaches for monitoring Sidekiq, via the Sidekiq Web interface, or the Grafana dashboard.
Sidekiq comes with a web application,
that can display the current state of a Sidekiq installation. We have
configured this to monitor
multiple Sidekiq configurations used throughout GOV.UK.
We have restricted public access as the Web UI allows modifying the state of Sidekiq queues.
To gain access you should setup SSH port forwarding to a backend box belonging to the environment you wish to monitor when connected to the office wireless network or the VPN:
$ ssh backend-1.backend.staging -CNL 9000:127.0.0.1:3211
Or on AWS:
$ ssh $(ssh integration "govuk_node_list --single-node -c backend").integration -CNL 9000:127.0.0.1:3211
To view your local Sidekiq queue, go to the sidekiq-monitoring
app in the vm and run
./bin/foreman start for all applications, or
run <app_name> for a specific app.
http://sidekiq-monitoring.dev.gov.uk:3211/to see a list of all the GOV.UK applications whose Sidekiq status you can monitor
http://sidekiq-monitoring.dev.gov.uk/<app_name>to directly monitor a specific app
Sidekiq Grafana Dashboard
You can also monitor Sidekiq queue lengths using this Grafana dashboard. It is available in all environments.
See also: Add sidekiq-monitoring to your application.
Sidekiq has in built retry logic (turned on by default, but configurable).
Middleware is used to send metrics (successes, failures, job timings and retry
statsd, which forwards the data to Graphite to be stored.
Jobs do fail, this is not inherently bad and can happen for a number of reasons. When a job fails it gets retried with an exponential backoff (up to 21 days), as long as retries are enabled. A high number of retries signifies a bigger, less transient problem maybe occurring.