CONCEPT

State in database, not queue¶

The authoritative state-of-record for pending work lives in the application database, not in the job queue. The queued job is a derivative — an optimisation for low-latency execution — not a durable commitment. Consequence: losing the queue is a performance problem, not a correctness problem.

The invariant¶

When work needs doing, two things happen atomically enough for the invariant to hold:

A row is written to the application database recording the intent (e.g. databases row with state: 'pending').
A job is enqueued to execute the intent (e.g. DatabaseCreationJob.perform_async(id)).

Only #1 is the source of truth. #2 is an optimisation that lets the work run within worker latency (ms-scale) instead of within scheduler-tick latency (minute-scale).

The critical property: a scheduled job periodically re-derives step #2 from step #1. If #2 is lost — dropped perform_async, Redis flush, Redis corruption, operator mistake — the scheduler reconstructs it from the database.

Why this is not the default mental model¶

The default async-job mental model treats the queue as the integration point: "put something on the queue" is the action, and the queue's state is where pending work lives. That model assumes:

The queue is durable (survives crashes).
The enqueue call is reliable (can't be dropped).
Queue mutations are ordered and visible to workers.

In practice, Redis-backed queues have modest durability (AOF sync windows, replication lag, operator errors), perform_async is a fire-and-forget network call that can fail after the caller thinks it succeeded, and multi-writer semantics can drop silently. Treating the queue as the state of record is implicitly trusting these properties.

The state-in-database model inverts the trust: the database is the only store whose durability is considered load-bearing. The queue is advisory.

How it works in practice¶

From :

"The solution is storing the state in our PlanetScale database. When creating a database for a user, we also create a record in our databases table immediately. This record starts with a state set to pending."

"This allows us to have a scheduled job that runs once a minute and checks if any databases are in a pending state. If they are, that triggers the creation job to get enqueued again."

The scheduler query is the inverse of the database model's state machine: for each terminal-work-job class, the database has an expression "a row is in a state implying this work should run." The scheduler iterates rows matching that expression and enqueues the job.

Consequences¶

Queue data loss is non-destructive. "If we lose all data in the queues at any time, we can recover without any loss in functionality." The recovery mechanism is the next scheduler tick.
Jobs must be idempotent. Re-enqueue from scheduler + existing queue entry + Sidekiq retry can all fire for the same database row; every job must tolerate being run more than once. See concepts/idempotent-job-design.
Database schema has shape obligations. There must be a column (or expression) indexable for "is work pending?" The scheduler's query must be cheap to evaluate on every tick.
Throughput budget shifts to the database. The scheduler tick issues a query per minute per worker-class. At scale this is non-trivial; it requires indexes on state columns.

An alternative framing: the queue is a materialised view over a database query. The database is the source; the queue is a cache with weaker durability. A stale or missing cache row doesn't lose data, just adds latency.

This framing generalises to other substrates:

Kubernetes Pod state: the spec is in etcd; the kubelet's observed state is a derivable view.
Outbox pattern: a DB row is written, a publisher later reads the row and publishes to Kafka; Kafka is downstream of the DB, not co-authoritative.
Event sourcing: the event log is the source; projections are derivables that can be rebuilt.

When the invariant fails¶

Work without a natural row. A one-off "send this specific email right now" job has no row in the database representing its pendingness. You can't scheduler-ify it without inventing a scheduled_emails table first.
Cross-store work. If pending work is implied by state in Redis + MySQL + an external vendor, the scheduler can't issue one authoritative query. Either consolidate state first, or accept loss of derivability.
State is destroyed by job execution. If FooJob deletes the row it was dispatched for, and state: 'pending' is the trigger, the scheduler relies on the job updating state to 'done' rather than deleting the row. Deletion semantics must be reconsidered.

Seen in¶

— canonical wiki introduction. The two-step flow (write row with state: pending + enqueue job) is PlanetScale's standard shape for async-job-backed user actions. The scheduler is the reconciliation mechanism that makes the DB the source of truth and the queue a derivable optimisation.