CONCEPT Cited by 2 sources

Tombstone (deletion marker)¶

A tombstone is a special marker written in place of a deleted record in an eventually-consistent or append-only store. It says "this key used to exist; do not resurrect it" — without which, a gossip or LSM-tree replication protocol could re-insert a "deleted" value from a lagging replica.

Why deletes can't just delete¶

In a gossip-replicated store (Cassandra, Dynamo, Riak), replicas receive updates at different times. If node A processes DELETE(k), node B is partitioned, and during the partition node C gossips its older copy of k to A, then A has no way to tell the resurrected key from a brand-new write — and reinserts it.

Tombstones fix this by making the delete a first-class value in the causal order: once A has the tombstone, any older-version write for k loses the (generation, version) merge.

"The tombstone is a special entry to invalidate the data entries that have a matching key without actual deletion of the data. The gossip protocol deletes the data from a node using a tombstone." (Source: sources/2023-07-16-highscalability-gossip-protocol-explained)

Tombstone lifecycle¶

Write — an UPDATE or PUT with a delete marker and a timestamp ≥ any concurrent write it must shadow.
Gossip propagation — the tombstone is just another versioned key-value; it replicates the same way live values do.
Compaction / GC — tombstones can be physically evicted only after a grace period long enough that every replica has definitely seen them. (Cassandra's gc_grace_seconds, default 10 days — if a partitioned node comes back after this window, it can zombie-resurrect deleted data.)

Costs¶

Storage amplification — tombstones accumulate until GC fires.
Read amplification — scans across a key range must filter tombstoned entries (Cassandra's infamous "tombstone warnings" in slow queries).
Compaction work — LSM compactors must carry tombstones forward through level merges until the grace period elapses.

Tombstone alternatives in CRDTs¶

CRDT designs often encode deletion differently — as a negative increment (counter CRDTs), a removal set (OR-sets), or a causal context (epoch-marked writes). These avoid explicit tombstones but pay in metadata instead. See concepts/crdt for the broader design space; systems/cr-sqlite (underneath Fly's Corrosion) uses per-row causal metadata rather than a tombstone column.

Seen in¶

sources/2023-07-16-highscalability-gossip-protocol-explained — definitional source within the gossip-protocol framework.
sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — Netflix KV DAL names Cassandra's item-level-delete tombstone cost as a primary motivation for its TTL-with-jitter mitigation: "some storage engines (any store which defers true deletion) such as Cassandra struggle with high volumes of deletes due to tombstone and compaction overhead." Record-level and range deletes on KV DAL explicitly collapse to one tombstone; only item-level deletes fall back to TTL-jitter. Introduces the doomstone failure mode: a write with a future-dated idempotency-token timestamp on a store sensitive to future timestamps becomes an immutable "deletion record in the future" that blocks legitimate writes — KV DAL's server-side drift rejection is the guard.

concepts/gossip-protocol
concepts/eventual-consistency
concepts/crdt
concepts/lsm-compaction
concepts/ttl-based-deletion-with-jitter — the Netflix-KV-DAL mitigation for high-volume item-level-delete tombstone load.
concepts/idempotency-token — future-dated tokens create doomstones on stores that treat deletion markers with future timestamps as un-overwriteable.
systems/apache-cassandra
systems/netflix-kv-dal