Sidekiq
Many of our applications use Sidekiq (see repo) for background job processing.
Sidekiq on GOV.UK
For redundancy, our publishing apps run on multiple containers (using Kubernetes). We would have all sorts of race conditions and difficulties querying Sidekiq if each app on each machine had its own instance of Sidekiq.
Therefore, we’ve build a GOV.UK wrapper for Sidekiq, called govuk_sidekiq. This allows all Sidekiq processes to talk to a single Redis instance. It also enables request tracing.
Retry logic
Sidekiq has in built retry logic (turned on by default, but configurable).
Jobs do fail, but this is not inherently bad and can happen for a number of reasons. When a job fails it gets retried with an exponential backoff (up to 21 days), as long as retries are enabled. A high number of retries signifies a bigger, less transient problem maybe occurring.
Monitoring
There are three approaches for monitoring Sidekiq:
- via the Grafana dashboard
- via the Rails console
- via the Sidekiq Web interface (for apps that have implemented it)
Sidekiq Grafana Dashboard
You can monitor Sidekiq queue lengths using the “Sidekiq: queue length, max delay” dashboard, which is available in all environments:
Sidekiq from the console
Sidekiq exposes a rich API which can be queried from the rails console.
The Stats
class gives a nice overview:
Sidekiq::Stats.new
# => #<Sidekiq::Stats:0x00007fbdf0ac4a30 @stats={:processed=>114999987, :failed=>15129, :scheduled_size=>22741, :retry_size=>1, :dead_size=>0, :processes_size=>3, :default_queue_latency=>10162.526781797409, :workers_size=>90, :enqueued=>1508687}>
Sidekiq::Stats.new.queues
# => {"delivery_immediate_high"=>949953, "default"=>451201, "delivery_immediate"=>101006, "email_generation_immediate"=>0, "email_generation_digest"=>0, "cleanup"=>0, "process_and_generate_emails"=>0, "delivery_digest"=>0}
You can also query and iterate through the Queue
s directly:
Sidekiq::Queue.all
# => [#<Sidekiq::Queue:0x00007fe98b133590 @name="cleanup", @rname="queue:cleanup">, #<Sidekiq::Queue:0x00007fe98b133518 @name="default", @rname="queue:default">, etc...
Sidekiq::Queue.all.collect {|q| [q.name, q.size] }
# => [["cleanup", 0], ["default", 0], ["delivery_digest", 0], ["delivery_immediate", 0], ["delivery_immediate_high", 0], ["email_generation_digest", 0], ["process_and_generate_emails", 0]]
You can do things like find and delete workers:
Sidekiq::RetrySet.new.filter { |job| job.klass == "AssetManagerAttachmentMetadataWorker" }.map(&:delete)
Be mindful that you may want to delete from both the ‘scheduled’ and ‘retries’ queues:
Sidekiq::Queue.new("default").filter { |job| job.delete if (job.klass == "PresentPageToPublishingApiWorker" && job.args.first == "PublishingApi::EmbassiesIndexPresenter") }
Sidekiq::RetrySet.new.filter { |job| job.delete if(job.klass == "PresentPageToPublishingApiWorker" && job.args.first == "PublishingApi::EmbassiesIndexPresenter") }
Sidekiq Web
Sidekiq comes with a web application, Sidekiq::Web
that can display the current state of Sidekiq’s queues for an application.
It needs to be configured and enabled on a per-app basis (example).
Sidekiq web is enabled for the following applications (and requires the Sidekiq Admin
permission in the relevant app in Signon):
Application | Sidekiq Web URL |
---|---|
Whitehall | https://whitehall-admin.publishing.service.gov.uk/sidekiq |
Apps that don’t have any ingress routes are accessed through port forwarding. Detailed instructions on how to do this will be in the relevant readme files for the following applications that have the Sidekiq Web UI enabled in this way:
Application | Documentation URL |
---|---|
Publishing API | https://github.com/alphagov/publishing-api/blob/main/docs/admin-tasks.md#viewing-the-sidekiq-ui |
NB, GOV.UK used to have a sidekiq-monitoring web app which monitored all GOV.UK Sidekiq configurations in one place, but this was removed when GOV.UK was replatformed to Kubernetes.