CONCEPT Cited by 1 source
Dead tuples and VACUUM¶
Definition¶
A dead tuple is a Postgres row version that has been logically
deleted or superseded by an update but not yet physically removed.
In Postgres's MVCC model, a
DELETE sets the row's xmax transaction ID and makes the row
invisible to transactions started after that delete — but the row
stays on disk until VACUUM reclaims its heap page slot and any
index entries that referenced it.
VACUUM is Postgres's garbage collector. It scans heap pages and index pages, identifies tuples older than the MVCC horizon (i.e. no longer visible to any active snapshot), and marks their space reusable. Autovacuum (concepts/postgres-autovacuum) runs it automatically against tables whose dead-tuple count exceeds a threshold.
Why dead tuples have a cost even when invisible¶
Dead tuples aren't returned by SELECT, but Postgres's executor still
encounters them:
- Sequential scans read each tuple off the heap page and check its
visibility metadata (
xmin/xmax) before discarding it. - Index scans are more insidious: the B-tree index itself accumulates leaf entries pointing at dead heap tuples. Each dead index entry forces an additional heap-page lookup that the executor then discards. Every dead tuple adds a hidden I/O per index scan.
For a workload that repeatedly scans an index to pick the oldest pending row — like a job queue — dead-tuple accumulation in the index translates directly into longer lock times and more I/O per job claim. PlanetScale's Griggs measured lock time rising from 2–3 ms baseline to 180 ms spikes at 24k dead tuples on a modest PS-5 cluster (Source: sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy).
Why dead tuples don't always get reclaimed¶
VACUUM can only reclaim dead tuples older than the MVCC horizon — that is, tuples no active transaction could still see. If a long transaction or a chain of continuously overlapping transactions pins the horizon (concepts/mvcc-horizon), dead tuples accumulate indefinitely even if autovacuum runs on schedule.
The common failure mode is therefore not "autovacuum is too slow" but "autovacuum has nothing it can legally reclaim". The fix is to free the horizon (by limiting workload-class concurrency — see patterns/workload-class-resource-budget), not to tune vacuum aggression.
Relation to write churn¶
In a high-write-churn table like a queue,
every successful job produces one dead tuple (the deleted row). At
thousands of jobs/sec, dead tuples accumulate faster than vacuum can
normally reclaim them — sustained operation depends on vacuum
windows opening regularly. Bloat-monitoring pg_stat_user_tables.n_dead_tup
becomes a leading indicator ahead of queue-depth or latency signals.
Bottom-up deletion¶
Postgres 14+ added B-tree bottom-up deletion for version churn — opportunistic cleanup of dead index tuples during normal insertions into the leaf. This lifts the floor for modern Postgres queue workloads but does not replace VACUUM: horizon-pinned tuples still can't be removed regardless of how the cleanup is triggered (Source: sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy).
Seen in¶
- sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy
— Dead tuples framed as the core queue-degradation signal on modern
Postgres. Demonstrated: linear dead-tuple accumulation during
mixed-workload runs even with
SKIP LOCKED+ batching; dead-tuple count at end (42k after 15 min with Traffic Control disabled at pre-stress loads; 383k with 800 jobs/sec and 3 concurrent overlapping analytics) as the direct cause of lock-time spikes (84.5 ms at 33k, 180 ms at 24k, 300+ ms at 383k).