Email Alert API: Unprocessed work
This alert indicates that Email Alert API has work that has not been processed in the generous amount of time we expect it to have been. Which alert you see depends on the type of work.
-
- This means there is a signficiant delay in generating emails for subscribers with “immediate” frequency subscriptions in response to a change in some content on GOV.UK.
-
- This means there is a significant delay in generating emails for subscribers with “immediate” frequency subscriptions in response to a custom message.
incomplete digest runs
.- This could be due to a failure in any of three workers:
- [Daily/Weekly]DigestInitiatorWorker generates a DigestRunSubscriber work item for each subscriber.
- DigestEmailGenerationWorker does the work of generating the digest email for a specific subscriber.
- DigestRunCompletionMarkerWorker periodically scans all the work items to see if the run is complete.
Each of the alerts is based on custom metrics that we collect using a periodic job. The metric will be something like “amount of unprocessed work older than X amount of time” (example).
Automatic recovery
Sometimes we lose work due to a flaw with the Sidekiq queueing system. In order to cope with this scenario, a RecoverLostJobsWorker runs every 30 minutes, and will try to requeue work that has not been processed within an hour. If work is being repeatedly lost, the alert will fire and you’ll need to investigate manually.
Manual steps to fix
Things to check:
Check Sentry for errors.
Check the Sidekiq dashboard for worker failures.
Check Kibana for errors - use
@fields.worker: <worker class>
for the query.Check the Email Alert API Technical dashboard for performance issues.
If all else fails, you can try running the work manually from a console. The automatic recovery worker code is a good example of how to do this, but you will need to use new.perform
instead of perform_async
.
A digest run may be “complete” - all work items generated, all work items processed - but not marked as such. In this case, you will need to use slightly different commands to investigate the incomplete run:
# find which digests are "incomplete" DigestRun.where("created_at < ?", 1.hour.ago).where(completed_at: nil) # try manually marking it as complete DigestRunCompletionMarkerWorker.new.perform