PATTERN

Postgres queue on same database as application¶

Problem¶

An application needs a job queue to track asynchronous work (sending emails, generating reports, running invoicing). Using an external queue (SQS, Kafka, RabbitMQ) requires coordinating two systems: if the application transaction commits but the queue enqueue fails, the job is lost; if the enqueue commits but the transaction rolls back, a phantom job runs. Distributed-transaction solutions (outbox pattern, 2PC) are non-trivial to get right.

Solution¶

Keep the job queue as a table in the same Postgres database as the application. The job row and the application mutation commit together in a single local transaction:

If the transaction rolls back, the job row rolls back with it.
If the transaction commits, the job is durably visible to workers.
Workers claim rows with SELECT ... FOR UPDATE SKIP LOCKED and commit a delete on success.

Canonical table shape (from concepts/postgres-queue-table):

CREATE TABLE jobs (
  id BIGSERIAL PRIMARY KEY,
  run_at TIMESTAMPTZ DEFAULT now(),
  status TEXT DEFAULT 'pending',
  payload JSONB
);
CREATE INDEX idx_jobs_fetch ON jobs (run_at) WHERE status = 'pending';

Structural caveats¶

This pattern has known failure modes that depend on the other workloads running on the same cluster, not on the queue itself:

Mixed-workload clusters pin the MVCC horizon. Long-running analytics transactions, or chains of continuously-overlapping shorter ones, prevent autovacuum from reclaiming the dead tuples produced by every job-delete. Dead tuples accumulate in the B-tree ORDER BY run_at index, lock times rise, queue depth grows.
Postgres timeouts don't fix it. statement_timeout, idle_in_transaction_session_timeout, and transaction_timeout all target single-query duration and cannot limit workload-class concurrency — so continuously overlapping short analytics queries defeat all three.
Autovacuum tuning doesn't fix it. Autovacuum can only reclaim tuples older than the horizon; tuning aggression has no effect when the horizon is pinned.

When the pattern works cleanly¶

Single-workload clusters dedicated to application + queue with no independent analytics or long-lived readers.
Short worker transactions (sub-millisecond job completion).
Active workload-class concurrency management via patterns/workload-class-resource-budget (PlanetScale Traffic Control, or hand-rolled pgbouncer pool-mode caps + app-side retry logic).

When to use a dedicated queue instead¶

Read replicas with hot-standby feedback, logical replication subscribers, or long-lived application transactions mean the horizon is effectively uncontrollable from the queue's perspective.
Queue throughput is in the tens of thousands of jobs/sec range (where dedicated systems like Kafka dominate regardless of MVCC dynamics).
You don't actually need transactional coupling between the application mutation and the job enqueue — in which case SQS + outbox is simpler than managing horizon dynamics.

Seen in¶

— Canonical wiki framing of the pattern's modern viability. Griggs reproduces Brandur Leach's 2015 degradation test on Postgres 18 and shows the pattern still works — but only if mixed-workload concurrency is actively managed. The conclusion: "Modern Postgres has raised the threshold — B-tree improvements and SKIP LOCKED buy significant headroom — but the underlying mechanism is unchanged… In a 'just use Postgres' world where queues, analytics, and application logic share a single database, this is not a theoretical risk. It is the normal operating condition."