CONCEPT Cited by 1 source
Idempotent job design¶
An idempotent job is one that can be run multiple times for the same input and produces the same end-state as running it once. Required discipline whenever a job can be enqueued more than once — which in practice includes every async-job system with retries, schedulers, or data-loss recovery paths.
Why jobs get enqueued multiple times¶
In a self-healing job queue architecture, three independent sources can enqueue the same job for the same record:
- User-triggered enqueue — the original code path
(e.g. user creates a database →
DatabaseCreationJob.perform_async(id)). - Scheduler re-enqueue — the
paired
scheduler sees the record still in
pendingstate and re-enqueues. - Sidekiq retry — a job that raised fails and is retried via the framework's retry policy.
All three can fire for the same record concurrently (or overlapping). Even without a self-healing design,
3 alone is sufficient to require idempotence: Sidekiq's¶
default is 25 retries with exponential backoff, and a job that succeeds partially then raises will retry from the start.
Three layers of protection¶
From sources/2026-04-21-planetscale-how-we-made-planetscales-background-jobs-self-healing:
"1. Exit quickly — We store state in our database and quickly exit a job if it no longer needs to be run."
"2. Use database locks — We avoid race conditions, such as when multiple jobs are updating the same data at once."
"3. Use sidekiq unique jobs — Sidekiq Enterprise includes the ability to have unique jobs. This will stop a duplicate job from ever being enqueued."
These three compose, each addressing a distinct failure mode:
Layer 1: state re-check at job entry¶
Cheapest layer. Protects against: the work having already been done by a previous (successful) run of the same job on the same record. Any subsequent re-enqueue just exits fast. No DB mutation required.
What it doesn't protect against: two workers both
reading pending? as true simultaneously and both
proceeding past the check. Needs layer 2 for that.
Layer 2: database row lock around mutation¶
ActiveRecord::Base#with_lock issues SELECT ... FOR
UPDATE inside a transaction, serialising the row across
workers. Protects against: concurrent workers both
entering a critical section and both performing the
mutation.
What it doesn't protect against: the same worker re-running the same job and the lock releasing between runs. Needs layer 1 for that. And two different jobs of the same class being enqueued and sitting in the queue simultaneously — needs layer 3.
Layer 3: framework-level unique jobs¶
class CheckDeploymentStatusJob < BaseJob
sidekiq_options queue: "urgent", retry: 5,
unique_for: 1.minute,
unique_until: :start
# ...
end
Sidekiq Enterprise's
unique_for rejects a duplicate enqueue at
perform_async time if an identical job (same class,
same args) is already in the queue. Protects against:
the scheduler + user both enqueuing within the same
1-minute window, the scheduler re-enqueuing while a
previous instance is still queued, retry storms.
What it doesn't protect against: back-to-back
re-enqueues > unique_for apart. Needs layer 1 for
that. And race conditions during perform across
different instances. Needs layer 2.
Why all three, not one¶
No single layer covers all the cases:
| Scenario | L1 | L2 | L3 |
|---|---|---|---|
| Job ran successfully; re-enqueued | ✅ | — | ✅ |
| Two workers race on same row | — | ✅ | — |
| Queue has two entries for same record | — | — | ✅ |
| Job partially ran; retry | depends | ✅ | — |
| Scheduler + user enqueue simultaneously | — | — | ✅ |
The three layers each close a different set of vulnerabilities. Skipping any of them leaves a class of production bugs open.
Cheap to expensive¶
Layer 1 (state check) is the cheapest — a single DB read, no locks, no framework overhead. Make every job start with it.
Layer 2 (DB lock) is the most expensive at scale —
holding a row lock serialises all workers on that row.
Use only around the mutation, not around the whole
perform.
Layer 3 (unique jobs) is free at perform time but
requires Sidekiq Enterprise (paid). Use selectively
for jobs where enqueue storms are expected (e.g.
status-check jobs that might be triggered from many
different places).
Natural unique keys¶
All three layers rely on jobs having a natural unique
key — usually a single DB row. Jobs dispatched with
perform_async(id) where id is a stable record
identifier compose well with all three layers.
Jobs with no natural unique key (e.g.
send_promo_email(campaign_params)) can't be made
idempotent by these three layers; they need explicit
deduplication tokens (see
concepts/idempotency-token).
Relationship to concepts/idempotency-token¶
idempotency-token is the request/operation-level
construct: a per-call identifier attached at the client
that the server deduplicates against. Typically used for
API calls (HTTP retries, hedged reads/writes).
idempotent-job-design is the worker-level
construct: how the job's perform method is written so
that the same conceptual work is safe to re-execute.
Often uses idempotency-token internally (e.g. a job
that hits an external API uses an idempotency token on
that call), but the overall job-level idempotence is
about the interaction with the application database.
Seen in¶
- sources/2026-04-21-planetscale-how-we-made-planetscales-background-jobs-self-healing —
canonical wiki introduction. PlanetScale's
three-layer composition (early-exit +
with_lock+unique_for) is named explicitly as their duplicate-job defence; each layer canonicalised with a code snippet. Required because the scheduler pattern deliberately introduces duplicate enqueues.