CONCEPT Cited by 1 source
Epoch-based distributed garbage collection¶
Definition¶
Epoch-based distributed garbage collection is a reclamation technique in which:
- Each reclaimable object is stamped at creation with a monotonically increasing global counter (a cluster epoch).
- Each shard publishes a local watermark
M(p)— the highest epoch for which all of its dependencies on objects from that epoch have been resolved. - The clusterwide safe-to-GC epoch is
M = min(M(p)) over all shards p. Any object stamped with epoch≤ Mis safe to delete.
This is structurally distinct from reference counting, which asks the inverse question ("does any consumer still reference this specific object?") and requires durable, coordinated, often linearizable reference counts maintained per-object.
The technique is canonicalised by Redpanda Cloud Topics for L0 file reclamation, with the reference-counting alternative explicitly rejected as introducing "an ocean of complexity" (Source: sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection).
The technique vs reference counting¶
The Cloud Topics post walks through the obvious framing then rejects it:
"To a first approximation, [an L0 object] resembles any other shared, read-only resource. We can think of each chunk of unreconciled data as a 'reference' to the object, and the object is safe to delete only once the number of references reaches zero (i.e., the reconciler has lifted all enclosed data to L1). Simple enough, but this framing belies an ocean of complexity."
The complexity:
"First, these reference counts must be durable. Partitions themselves are spread across the cluster to balance load, so we'll need to support updates from anywhere. Do they need to be linearizable? Reference count updates are tightly correlated with the partition-local state that reconcilers use to make progress. Maybe they need to be atomic then? Would we be satisfied with eventual consistency? These are real questions about distributed systems design."
Verdict:
"Redpanda does not track L0 objects this way. Instead, we assign an 'epoch' to every object in L0 (think of it as a coarse-grained logical timestamp) and leverage carefully structured per-partition state to construct a global view of which L0 objects are safe to remove. No central index, no shared state, and no coordinated updates."
The key reframe: "is anyone still referencing this?" (per-object, distributed) becomes "has the cluster moved past this epoch?" (per-epoch, monotonic-bound).
Mechanism¶
Object stamping Reclamation
│ │
epoch = E_now ▼ ▼
(cluster-global) ──► obj_id = ⟨E_now, …⟩ if obj.epoch ≤ M:
delete(obj)
│
▼
Object goes into circulation;
shards consume / process it.
│
▼
For each shard p:
M(p) ← highest epoch such that
shard p's dependencies on
epoch ≤ M(p) are resolved.
Clusterwide:
M = min over all p of M(p)
(lazy aggregation; see
[patterns/lazy-aggregate-from-monotonic-local-state](<../patterns/lazy-aggregate-from-monotonic-local-state.md>))
The three load-bearing properties:
- Stamping is at creation, immutable. No update path on the object itself. Anti-pattern: revising the stamp later.
M(p)is monotonically non-decreasing per shard. Once a shard publishesM(p) = k, future publications are≥ k.- Stale observations are conservative-safe. A node operating
on a stale view of
M(p)for some shard computes a smallerM, which is more conservative — it never deletes something still in use. Latency cost, not correctness cost.
Why no central index, no shared state, no coordinated updates¶
Cloud Topics' canonical slogan: "No central index, no shared state, and no coordinated updates." Each clause maps to a property of the technique:
- No central index — there is no per-object reference table.
Eligibility is computed from
(obj.epoch, M)alone. - No shared state — each shard's
M(p)is owned by that shard. No mutable state crosses shard boundaries. - No coordinated updates —
Mis computed via min over already-disseminated local watermarks, not via a coordination protocol. The dissemination substrate can be best-effort.
This is the operational pay-off: garbage collection becomes a property of the system that runs in the background without contention with the write path or coordination with peers.
Granularity trade-off¶
The fundamental cost: all objects in an epoch are reclaimed
together. Granularity is epoch_advance_period worth of objects.
Compared to reference counting:
| Reference counting | Epoch-based GC | |
|---|---|---|
| Reclamation latency | ~processing_time(obj) |
~epoch_advance_period × N_active_epochs |
| Reclamation granularity | Per object | Per cohort (one epoch's worth) |
| Per-object metadata | Reference count | None (only the stamp) |
| Failure-mode complexity | High (lost counts → leaks or bad deletes) | Low (lost watermarks → recompute) |
The trade is latency-to-reclaim for coordination-cost. If storage is cheap and coordination is expensive (the typical case at distributed-system scale), epoch-based GC wins.
Pre-conditions¶
Epoch-based distributed GC applies cleanly when:
- Objects are temporary by design. They have a clear "processed" condition after which they're known to be reclaimable. Cloud Topics L0 files satisfy this (after the Reconciler lifts to L1).
- There's an existing dissemination substrate. A periodic metadata-distribution channel exists (e.g. gossip, controller-driven push, periodic broadcast) — no need to add a coordination protocol just for GC.
- Per-shard work is linearizable per-shard. Each shard
processes its assigned epoch-stamped objects in some order
that lets it advance
M(p)monotonically. This is the default when shards have local Raft logs or consensus state. - Reclamation latency tolerance is at least one epoch-advance period. If the system needs sub-epoch reclamation latency, the epoch advance frequency would have to be very high, eroding the technique's coordination benefits.
Variants¶
The Cloud Topics instance combines three composed primitives:
- patterns/epoch-stamp-on-object-id-for-gc — stamp the epoch in the durable identifier.
- patterns/per-partition-rsm-for-gc-tracking — track
M(p)in a per-shard replicated state machine (sliding-window epoch tracking is the specific shape). - patterns/lazy-aggregate-from-monotonic-local-state —
compute
M = min(M(p))lazily over disseminated watermarks.
Other instantiations could substitute different mechanisms for
each — for example, a flat key-value store for M(p) instead of
a Raft state machine, or a custom gossip channel instead of an
existing dissemination service. The conceptual core is unchanged.
Relationship to epoch-based memory reclamation¶
Classical epoch-based memory reclamation (EBR; Fraser, Hart et al.) in lock-free data structures uses the same monotonic-counter shape, but at the thread / process altitude:
| EBR (process altitude) | Epoch-based distributed GC (cluster altitude) | |
|---|---|---|
| Stamp on | Memory deallocation events | Object IDs at creation |
| Local watermark per | Thread (epoch the thread is currently in) | Shard / partition |
| Aggregation mechanism | Process-local read of all thread epochs | Lazy gossip / dissemination |
| Reclamation unit | Pointers from old epochs | Objects from old epochs |
Cloud Topics' technique generalises EBR to the cluster: per-shard state replaces per-thread state, and a dissemination substrate replaces shared-memory reads.
Seen in¶
- sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection
— canonical wiki instance. Cloud Topics L0 GC uses cluster-epoch
stamping, per-partition Raft RSM for
M(p), and lazy aggregate via Redpanda's internal metadata-dissemination service. Reference counting explicitly rejected.
Related¶
- concepts/cluster-epoch — the load-bearing primitive.
- concepts/sliding-window-epoch-tracking — the per-partition
state-machine shape used to publish
M(p)with leadership- change tolerance. - concepts/garbage-collection — the parent concept; this page is one specific GC technique.
- concepts/l0-l1-file-compaction-for-object-store-streaming — the file-layout context that motivates the technique in the Cloud Topics instance.
- patterns/epoch-stamp-on-object-id-for-gc — the stamping pattern.
- patterns/per-partition-rsm-for-gc-tracking — the per-shard state-tracking pattern.
- patterns/lazy-aggregate-from-monotonic-local-state — the global-aggregation pattern.
- systems/redpanda-cloud-topics — the canonical system instance.