CONCEPT Cited by 1 source

Epoch-based distributed garbage collection¶

Definition¶

Epoch-based distributed garbage collection is a reclamation technique in which:

Each reclaimable object is stamped at creation with a monotonically increasing global counter (a cluster epoch).
Each shard publishes a local watermark M(p) — the highest epoch for which all of its dependencies on objects from that epoch have been resolved.
The clusterwide safe-to-GC epoch is M = min(M(p)) over all shards p. Any object stamped with epoch ≤ M is safe to delete.

This is structurally distinct from reference counting, which asks the inverse question ("does any consumer still reference this specific object?") and requires durable, coordinated, often linearizable reference counts maintained per-object.

The technique is canonicalised by Redpanda Cloud Topics for L0 file reclamation, with the reference-counting alternative explicitly rejected as introducing "an ocean of complexity" (Source: sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection).

The technique vs reference counting¶

The Cloud Topics post walks through the obvious framing then rejects it:

"To a first approximation, [an L0 object] resembles any other shared, read-only resource. We can think of each chunk of unreconciled data as a 'reference' to the object, and the object is safe to delete only once the number of references reaches zero (i.e., the reconciler has lifted all enclosed data to L1). Simple enough, but this framing belies an ocean of complexity."

The complexity:

"First, these reference counts must be durable. Partitions themselves are spread across the cluster to balance load, so we'll need to support updates from anywhere. Do they need to be linearizable? Reference count updates are tightly correlated with the partition-local state that reconcilers use to make progress. Maybe they need to be atomic then? Would we be satisfied with eventual consistency? These are real questions about distributed systems design."

Verdict:

"Redpanda does not track L0 objects this way. Instead, we assign an 'epoch' to every object in L0 (think of it as a coarse-grained logical timestamp) and leverage carefully structured per-partition state to construct a global view of which L0 objects are safe to remove. No central index, no shared state, and no coordinated updates."

The key reframe: "is anyone still referencing this?" (per-object, distributed) becomes "has the cluster moved past this epoch?" (per-epoch, monotonic-bound).

Mechanism¶

                 Object stamping                 Reclamation
                       │                             │
   epoch = E_now       ▼                             ▼
   (cluster-global) ──► obj_id = ⟨E_now, …⟩    if obj.epoch ≤ M:
                                                  delete(obj)
                       │
                       ▼
            Object goes into circulation;
            shards consume / process it.
                       │
                       ▼
   For each shard p:
     M(p) ← highest epoch such that
            shard p's dependencies on
            epoch ≤ M(p) are resolved.

   Clusterwide:
     M = min over all p of M(p)
       (lazy aggregation; see
        [patterns/lazy-aggregate-from-monotonic-local-state](<../patterns/lazy-aggregate-from-monotonic-local-state.md>))

The three load-bearing properties:

Stamping is at creation, immutable. No update path on the object itself. Anti-pattern: revising the stamp later.
M(p) is monotonically non-decreasing per shard. Once a shard publishes M(p) = k, future publications are ≥ k.
Stale observations are conservative-safe. A node operating on a stale view of M(p) for some shard computes a smaller M, which is more conservative — it never deletes something still in use. Latency cost, not correctness cost.

Why no central index, no shared state, no coordinated updates¶

Cloud Topics' canonical slogan: "No central index, no shared state, and no coordinated updates." Each clause maps to a property of the technique:

No central index — there is no per-object reference table. Eligibility is computed from (obj.epoch, M) alone.
No shared state — each shard's M(p) is owned by that shard. No mutable state crosses shard boundaries.
No coordinated updates — M is computed via min over already-disseminated local watermarks, not via a coordination protocol. The dissemination substrate can be best-effort.

This is the operational pay-off: garbage collection becomes a property of the system that runs in the background without contention with the write path or coordination with peers.

Granularity trade-off¶

The fundamental cost: all objects in an epoch are reclaimed together. Granularity is epoch_advance_period worth of objects. Compared to reference counting:

	Reference counting	Epoch-based GC
Reclamation latency	`~processing_time(obj)`	`~epoch_advance_period × N_active_epochs`
Reclamation granularity	Per object	Per cohort (one epoch's worth)
Per-object metadata	Reference count	None (only the stamp)
Failure-mode complexity	High (lost counts → leaks or bad deletes)	Low (lost watermarks → recompute)

The trade is latency-to-reclaim for coordination-cost. If storage is cheap and coordination is expensive (the typical case at distributed-system scale), epoch-based GC wins.

Pre-conditions¶

Epoch-based distributed GC applies cleanly when:

Objects are temporary by design. They have a clear "processed" condition after which they're known to be reclaimable. Cloud Topics L0 files satisfy this (after the Reconciler lifts to L1).
There's an existing dissemination substrate. A periodic metadata-distribution channel exists (e.g. gossip, controller-driven push, periodic broadcast) — no need to add a coordination protocol just for GC.
Per-shard work is linearizable per-shard. Each shard processes its assigned epoch-stamped objects in some order that lets it advance M(p) monotonically. This is the default when shards have local Raft logs or consensus state.
Reclamation latency tolerance is at least one epoch-advance period. If the system needs sub-epoch reclamation latency, the epoch advance frequency would have to be very high, eroding the technique's coordination benefits.

Variants¶

The Cloud Topics instance combines three composed primitives:

patterns/epoch-stamp-on-object-id-for-gc — stamp the epoch in the durable identifier.
patterns/per-partition-rsm-for-gc-tracking — track M(p) in a per-shard replicated state machine (sliding-window epoch tracking is the specific shape).
patterns/lazy-aggregate-from-monotonic-local-state — compute M = min(M(p)) lazily over disseminated watermarks.

Other instantiations could substitute different mechanisms for each — for example, a flat key-value store for M(p) instead of a Raft state machine, or a custom gossip channel instead of an existing dissemination service. The conceptual core is unchanged.

Relationship to epoch-based memory reclamation¶

Classical epoch-based memory reclamation (EBR; Fraser, Hart et al.) in lock-free data structures uses the same monotonic-counter shape, but at the thread / process altitude:

	EBR (process altitude)	Epoch-based distributed GC (cluster altitude)
Stamp on	Memory deallocation events	Object IDs at creation
Local watermark per	Thread (epoch the thread is currently in)	Shard / partition
Aggregation mechanism	Process-local read of all thread epochs	Lazy gossip / dissemination
Reclamation unit	Pointers from old epochs	Objects from old epochs

Cloud Topics' technique generalises EBR to the cluster: per-shard state replaces per-thread state, and a dissemination substrate replaces shared-memory reads.

Seen in¶

sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection — canonical wiki instance. Cloud Topics L0 GC uses cluster-epoch stamping, per-partition Raft RSM for M(p), and lazy aggregate via Redpanda's internal metadata-dissemination service. Reference counting explicitly rejected.

concepts/cluster-epoch — the load-bearing primitive.
concepts/sliding-window-epoch-tracking — the per-partition state-machine shape used to publish M(p) with leadership- change tolerance.
concepts/garbage-collection — the parent concept; this page is one specific GC technique.
concepts/l0-l1-file-compaction-for-object-store-streaming — the file-layout context that motivates the technique in the Cloud Topics instance.
patterns/epoch-stamp-on-object-id-for-gc — the stamping pattern.
patterns/per-partition-rsm-for-gc-tracking — the per-shard state-tracking pattern.
patterns/lazy-aggregate-from-monotonic-local-state — the global-aggregation pattern.
systems/redpanda-cloud-topics — the canonical system instance.