Skip to content

CONCEPT Cited by 1 source

Dead tuples and VACUUM

Definition

A dead tuple is a Postgres row version that has been logically deleted or superseded by an update but not yet physically removed. In Postgres's MVCC model, a DELETE sets the row's xmax transaction ID and makes the row invisible to transactions started after that delete — but the row stays on disk until VACUUM reclaims its heap page slot and any index entries that referenced it.

VACUUM is Postgres's garbage collector. It scans heap pages and index pages, identifies tuples older than the MVCC horizon (i.e. no longer visible to any active snapshot), and marks their space reusable. Autovacuum (concepts/postgres-autovacuum) runs it automatically against tables whose dead-tuple count exceeds a threshold.

Why dead tuples have a cost even when invisible

Dead tuples aren't returned by SELECT, but Postgres's executor still encounters them:

  • Sequential scans read each tuple off the heap page and check its visibility metadata (xmin/xmax) before discarding it.
  • Index scans are more insidious: the B-tree index itself accumulates leaf entries pointing at dead heap tuples. Each dead index entry forces an additional heap-page lookup that the executor then discards. Every dead tuple adds a hidden I/O per index scan.

For a workload that repeatedly scans an index to pick the oldest pending row — like a job queue — dead-tuple accumulation in the index translates directly into longer lock times and more I/O per job claim. PlanetScale's Griggs measured lock time rising from 2–3 ms baseline to 180 ms spikes at 24k dead tuples on a modest PS-5 cluster (Source: sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy).

Why dead tuples don't always get reclaimed

VACUUM can only reclaim dead tuples older than the MVCC horizon — that is, tuples no active transaction could still see. If a long transaction or a chain of continuously overlapping transactions pins the horizon (concepts/mvcc-horizon), dead tuples accumulate indefinitely even if autovacuum runs on schedule.

The common failure mode is therefore not "autovacuum is too slow" but "autovacuum has nothing it can legally reclaim". The fix is to free the horizon (by limiting workload-class concurrency — see patterns/workload-class-resource-budget), not to tune vacuum aggression.

Relation to write churn

In a high-write-churn table like a queue, every successful job produces one dead tuple (the deleted row). At thousands of jobs/sec, dead tuples accumulate faster than vacuum can normally reclaim them — sustained operation depends on vacuum windows opening regularly. Bloat-monitoring pg_stat_user_tables.n_dead_tup becomes a leading indicator ahead of queue-depth or latency signals.

Bottom-up deletion

Postgres 14+ added B-tree bottom-up deletion for version churn — opportunistic cleanup of dead index tuples during normal insertions into the leaf. This lifts the floor for modern Postgres queue workloads but does not replace VACUUM: horizon-pinned tuples still can't be removed regardless of how the cleanup is triggered (Source: sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy).

Seen in

  • sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy — Dead tuples framed as the core queue-degradation signal on modern Postgres. Demonstrated: linear dead-tuple accumulation during mixed-workload runs even with SKIP LOCKED + batching; dead-tuple count at end (42k after 15 min with Traffic Control disabled at pre-stress loads; 383k with 800 jobs/sec and 3 concurrent overlapping analytics) as the direct cause of lock-time spikes (84.5 ms at 33k, 180 ms at 24k, 300+ ms at 383k).
Last updated · 319 distilled / 1,201 read