CONCEPT Cited by 1 source

Coordinated compaction¶

Definition¶

Coordinated compaction is Redpanda's protocol-level fix for the compaction–replication race in log-compacted topics. Instead of each broker making independent, time-based decisions about when to delete tombstones and transaction control batches, replicas coordinate via a leader-driven watermark protocol to ensure no metadata record is removed until every replica has compacted past the data it governs.

"To behave correctly even during prolonged broker outages or slowness, Redpanda runs a small coordination protocol on top of compaction that keeps a tombstone or control marker in place until every replica has compacted the associated data records." (Source: sources/2026-06-25-redpanda-kafkas-log-compaction-corrupts-data)

Design principle¶

"Correctness is a guarantee, compaction is best-effort." (Source: sources/2026-06-25-redpanda-kafkas-log-compaction-corrupts-data)

If a replica stays offline indefinitely, MTRO/MXRO do not advance — cleanup pauses clusterwide. Storage accumulates but data safety is never compromised. Once the replica rejoins and compacts, cleanup resumes automatically.

Watermarks¶

For tombstone removal¶

Watermark	Scope	Definition
MCCO (Maximum Cleanly Compacted Offset)	Per-replica	Offset up to which this replica's log has been cleanly compacted — no duplicate keys below this point
MTRO (Maximum Tombstone Removal Offset)	Per-replica-set	`min(all MCCOs)` — the offset below which tombstones are safe to remove on any replica

For transaction marker removal¶

Watermark	Scope	Definition
MXFO (Maximum Transaction-Free Offset)	Per-replica	Offset up to which all transactions are fully resolved (committed or aborted)
MXRO (Maximum Transaction-Marker Removal Offset)	Per-replica-set	`min(all MXFOs)` — the offset below which COMMIT/ABORT markers are safe to remove

Protocol phases¶

Collection — the partition leader periodically asks each follower: "What's your MCCO/MXFO?"
Distribution — the leader computes MTRO = min(all MCCOs) and MXRO = min(all MXFOs), then pushes these values back to every replica.

Invariants¶

MTRO/MXRO never go backward — once a cleanup decision is made, it's permanent. Late RPCs from previous leaders are ignored.
MCCO/MXFO only move forward — once data is cleanly compacted, it stays compacted.
Offline replicas freeze progress — their last-known MCCO/MXFO is used in the min computation, preventing MTRO/MXRO from advancing past them.

Edge cases¶

Leadership changes: new leader uses existing MTRO as starting point, collects fresh MCCOs, re-broadcasts even if value unchanged (followers may have missed the last update).
Replica added: MCCO initialised to group's current MTRO (correct because log will be received from a replica already compacted to that point).
Replica removed: its MCCO drops from the min computation, potentially advancing MTRO.

Architectural analogy¶

Structurally similar to epoch-based distributed GC used in Redpanda's Cloud Topics L0 garbage collection — both use leader-driven aggregation of per-shard monotonic watermarks to determine safe-to-delete thresholds, with staleness always being conservative-safe due to monotonicity.

Seen in¶

sources/2026-06-25-redpanda-kafkas-log-compaction-corrupts-data — canonical disclosure of the coordinated compaction protocol with full watermark definitions, protocol phases, and edge-case handling.

concepts/log-compaction — the mechanism coordinated compaction governs
concepts/compaction-replication-race — the bug this protocol fixes
patterns/coordinated-compaction-protocol — the named pattern
patterns/leader-driven-safe-deletion-watermark — the watermark aggregation shape
concepts/epoch-based-distributed-gc — analogous distributed GC technique used in Cloud Topics
concepts/in-sync-replica-set — replica availability tracking
systems/redpanda