Skip to content

PATTERN Cited by 1 source

Postgres queue on same database as application

Problem

An application needs a job queue to track asynchronous work (sending emails, generating reports, running invoicing). Using an external queue (SQS, Kafka, RabbitMQ) requires coordinating two systems: if the application transaction commits but the queue enqueue fails, the job is lost; if the enqueue commits but the transaction rolls back, a phantom job runs. Distributed-transaction solutions (outbox pattern, 2PC) are non-trivial to get right.

Solution

Keep the job queue as a table in the same Postgres database as the application. The job row and the application mutation commit together in a single local transaction:

  • If the transaction rolls back, the job row rolls back with it.
  • If the transaction commits, the job is durably visible to workers.
  • Workers claim rows with SELECT ... FOR UPDATE SKIP LOCKED and commit a delete on success.

Canonical table shape (from concepts/postgres-queue-table):

CREATE TABLE jobs (
  id BIGSERIAL PRIMARY KEY,
  run_at TIMESTAMPTZ DEFAULT now(),
  status TEXT DEFAULT 'pending',
  payload JSONB
);
CREATE INDEX idx_jobs_fetch ON jobs (run_at) WHERE status = 'pending';

Structural caveats

This pattern has known failure modes that depend on the other workloads running on the same cluster, not on the queue itself:

  1. Mixed-workload clusters pin the MVCC horizon. Long-running analytics transactions, or chains of continuously-overlapping shorter ones, prevent autovacuum from reclaiming the dead tuples produced by every job-delete. Dead tuples accumulate in the B-tree ORDER BY run_at index, lock times rise, queue depth grows.
  2. Postgres timeouts don't fix it. statement_timeout, idle_in_transaction_session_timeout, and transaction_timeout all target single-query duration and cannot limit workload-class concurrency — so continuously overlapping short analytics queries defeat all three.
  3. Autovacuum tuning doesn't fix it. Autovacuum can only reclaim tuples older than the horizon; tuning aggression has no effect when the horizon is pinned.

When the pattern works cleanly

  • Single-workload clusters dedicated to application + queue with no independent analytics or long-lived readers.
  • Short worker transactions (sub-millisecond job completion).
  • Active workload-class concurrency management via patterns/workload-class-resource-budget (PlanetScale Traffic Control, or hand-rolled pgbouncer pool-mode caps + app-side retry logic).

When to use a dedicated queue instead

  • Read replicas with hot-standby feedback, logical replication subscribers, or long-lived application transactions mean the horizon is effectively uncontrollable from the queue's perspective.
  • Queue throughput is in the tens of thousands of jobs/sec range (where dedicated systems like Kafka dominate regardless of MVCC dynamics).
  • You don't actually need transactional coupling between the application mutation and the job enqueue — in which case SQS + outbox is simpler than managing horizon dynamics.

Seen in

  • sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy — Canonical wiki framing of the pattern's modern viability. Griggs reproduces Brandur Leach's 2015 degradation test on Postgres 18 and shows the pattern still works — but only if mixed-workload concurrency is actively managed. The conclusion: "Modern Postgres has raised the threshold — B-tree improvements and SKIP LOCKED buy significant headroom — but the underlying mechanism is unchanged… In a 'just use Postgres' world where queues, analytics, and application logic share a single database, this is not a theoretical risk. It is the normal operating condition."
Last updated · 319 distilled / 1,201 read