CONCEPT Cited by 1 source
State in database, not queue¶
The authoritative state-of-record for pending work lives in the application database, not in the job queue. The queued job is a derivative — an optimisation for low-latency execution — not a durable commitment. Consequence: losing the queue is a performance problem, not a correctness problem.
The invariant¶
When work needs doing, two things happen atomically enough for the invariant to hold:
- A row is written to the application database recording
the intent (e.g.
databasesrow withstate: 'pending'). - A job is enqueued to execute the intent (e.g.
DatabaseCreationJob.perform_async(id)).
Only #1 is the source of truth. #2 is an optimisation that lets the work run within worker latency (ms-scale) instead of within scheduler-tick latency (minute-scale).
The critical property: a
scheduled job
periodically re-derives step #2 from step #1. If #2 is lost
— dropped perform_async, Redis flush, Redis corruption,
operator mistake — the scheduler reconstructs it from the
database.
Why this is not the default mental model¶
The default async-job mental model treats the queue as the integration point: "put something on the queue" is the action, and the queue's state is where pending work lives. That model assumes:
- The queue is durable (survives crashes).
- The enqueue call is reliable (can't be dropped).
- Queue mutations are ordered and visible to workers.
In practice, Redis-backed queues have modest durability (AOF
sync windows, replication lag, operator errors),
perform_async is a fire-and-forget network call that can
fail after the caller thinks it succeeded, and multi-writer
semantics can drop silently. Treating the queue as the state
of record is implicitly trusting these properties.
The state-in-database model inverts the trust: the database is the only store whose durability is considered load-bearing. The queue is advisory.
How it works in practice¶
From sources/2026-04-21-planetscale-how-we-made-planetscales-background-jobs-self-healing:
"The solution is storing the state in our PlanetScale
database. When creating a database for a user, we also
create a record in our databases table immediately. This
record starts with a state set to pending."
"This allows us to have a scheduled job that runs once a
minute and checks if any databases are in a pending
state. If they are, that triggers the creation job to get
enqueued again."
The scheduler query is the inverse of the database model's state machine: for each terminal-work-job class, the database has an expression "a row is in a state implying this work should run." The scheduler iterates rows matching that expression and enqueues the job.
Consequences¶
- Queue data loss is non-destructive. "If we lose all data in the queues at any time, we can recover without any loss in functionality." The recovery mechanism is the next scheduler tick.
- Jobs must be idempotent. Re-enqueue from scheduler + existing queue entry + Sidekiq retry can all fire for the same database row; every job must tolerate being run more than once. See concepts/idempotent-job-design.
- Database schema has shape obligations. There must be a column (or expression) indexable for "is work pending?" The scheduler's query must be cheap to evaluate on every tick.
- Throughput budget shifts to the database. The scheduler tick issues a query per minute per worker-class. At scale this is non-trivial; it requires indexes on state columns.
Related: queue as view over DB state¶
An alternative framing: the queue is a materialised view over a database query. The database is the source; the queue is a cache with weaker durability. A stale or missing cache row doesn't lose data, just adds latency.
This framing generalises to other substrates:
- Kubernetes Pod state: the spec is in etcd; the kubelet's observed state is a derivable view.
- Outbox pattern: a DB row is written, a publisher later reads the row and publishes to Kafka; Kafka is downstream of the DB, not co-authoritative.
- Event sourcing: the event log is the source; projections are derivables that can be rebuilt.
When the invariant fails¶
- Work without a natural row. A one-off
"send this specific email right now" job has no
row in the database representing its pendingness.
You can't scheduler-ify it without inventing a
scheduled_emailstable first. - Cross-store work. If pending work is implied by state in Redis + MySQL + an external vendor, the scheduler can't issue one authoritative query. Either consolidate state first, or accept loss of derivability.
- State is destroyed by job execution. If
FooJobdeletes the row it was dispatched for, andstate: 'pending'is the trigger, the scheduler relies on the job updating state to'done'rather than deleting the row. Deletion semantics must be reconsidered.
Seen in¶
- sources/2026-04-21-planetscale-how-we-made-planetscales-background-jobs-self-healing —
canonical wiki introduction. The two-step flow
(write row with
state: pending+ enqueue job) is PlanetScale's standard shape for async-job-backed user actions. The scheduler is the reconciliation mechanism that makes the DB the source of truth and the queue a derivable optimisation.