Many of our applications use Sidekiq for background job processing.
There’s a GOV.UK wrapper that will help you set it up.
Sidekiq has in built retry logic (turned on by default, but configurable). We graph Sidekiq job stats: successes, failures, job timings and retry counts. These can be found under the statsd Graphite namespace. i.e.:
Jobs do fail, this is not inherently bad and can happen for a number of reasons. When a job fails it gets retried with an exponential backoff (up to 21 days), as long as retries are enabled. A high number of retries signifies a bigger, less transient problem maybe occuring.
If an alert fires a good place to start investigation is the Sidekiq monitor.This page is owned by #2ndline and needs to be reviewed