PATTERN Cited by 1 source
Epoch stamp on object ID for GC¶
Pattern¶
When designing a system that produces temporary objects with unbounded peak counts and a clear "processed everywhere" condition, embed a cluster-global monotonic counter (a cluster epoch) directly into the object's durable identifier at creation time — not as a side- channel attribute, not in a separate metadata store, but in the ID itself. Garbage collection then becomes a simple comparison between the stamp encoded in the ID and a clusterwide safe-to-GC watermark, with no per-object state required to make the deletion decision.
Canonicalised by Redpanda Cloud Topics for L0 file reclamation:
"The cluster epoch is a monotonically increasing counter that we embed in every L0 object ID at creation time." (Source: sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection)
Mechanics¶
Stamping (creation):
epoch = current_cluster_epoch() ← from local cache;
slightly stale OK
obj_id = encode(⟨epoch, suffix⟩) ← e.g. "1234/abc-xyz"
or epoch in path prefix
write_object(obj_id, payload)
Reclamation (decision is local, no per-object metadata):
M = current_cluster_safe_to_gc() ← from gossip / metadata
dissemination
for obj in list_objects(prefix):
⟨e, _⟩ = decode(obj.id)
if e ≤ M:
delete(obj)
Three properties make this work:
- Stamping is at creation, immutable. Once written, the object's epoch is fixed. The ID itself encodes the GC-relevant state.
- Decision is local-only. Deciding whether to delete needs
only
(obj.epoch, M)— both readable without any cross-partition lookup. - No per-object metadata table. Compare to reference-counting, which requires a per-object durable count somewhere.
Stamp placement choices¶
The pattern is silent on where in the ID the epoch goes, but the practical choices have operational consequences:
| Placement | Pros | Cons |
|---|---|---|
Path prefix (e.g. /epoch=1234/obj.bin) |
Listing by epoch is a directory scan; lifecycle policies can target by prefix | Fixed taxonomy; no flat layout |
ID prefix bytes (e.g. 1234-abc-xyz) |
Prefix listing groups by epoch; path-agnostic | Sortable but coarse |
| Embedded structured field | Explicit, parseable; allows tooling | Requires decode step |
| Side-channel attribute | Maximum flexibility | Loses the "in the ID itself" property — needs metadata store |
The Cloud Topics post does not disclose its specific format, only that the epoch is "in every L0 object ID." For object stores with prefix-listing-as-cheap (S3, GCS, ADLS), placing the epoch in the path prefix gives free epoch-grouped listing.
Why "in the ID itself" matters¶
The pattern's load-bearing property is that the epoch is recoverable from any reference to the object:
- A storage path (
s3://bucket/epoch=1234/...) — recover by parsing. - A metadata pointer (e.g. a placeholder batch in a Raft log) — recover by decoding.
- A listing call against object storage — recover from each returned key.
This means there is no situation where the system holds a reference to the object but doesn't know its epoch. Compare a side-channel attribute (e.g. an S3 object tag): if the attribute is lost or the metadata store is unavailable, GC stalls. The in-ID stamp is always retrievable.
Listing-driven deletion sweep¶
The pattern composes naturally with a deletion sweep that
periodically lists objects, reads M, and deletes everything with
epoch ≤ M. No per-object tracking, no scheduling table, no
cursor — the listing is the cursor, the decision is local, the
sweep is idempotent (re-listing after a crash gives the same
answer with at most a smaller M).
The Cloud Topics post forward-references this: "Stay tuned for part 2, where we discuss how the garbage collector's design enables us to continually delete thousands of L0 objects without any locally persistent state, explicit coordination, or wasted work." The "without any locally persistent state" property is made possible by the in-ID stamp.
Compatibility constraints¶
The pattern works cleanly when:
- Object storage supports listing. S3, GCS, ADLS all do. Raw block storage does not.
- The epoch namespace is monotonic and durable. A cluster-wide counter advanced by some cluster-epoch mechanism, not a per-broker local counter that could go backwards.
- The reclamation decision can tolerate epoch granularity.
All objects in epoch
Eare reclaimed together whenEfalls belowM. If finer-grained per-object lifecycle is required, compose with per-object policies (e.g. retention overrides for specific objects).
Anti-patterns¶
- Side-channel epoch tag instead of in-ID stamp. Shifts the "recovery from any reference" property to the metadata store, defeating the always-available promise.
- Per-broker monotonic counter (not cluster-global).
Reintroduces the cross-broker coordination the pattern was
designed to avoid: now
M(p)would have to be a 2-D quantityM(p, broker). - Stamping later than creation. A two-phase "create-then-stamp" introduces a window in which an object exists without an epoch, defeating the recovery property.
- Mutable epoch. Updating an existing object's epoch reintroduces coordinated update mechanics — the whole point of the technique was to avoid this.
- Sub-epoch granularity reclamation. Trying to reclaim some
objects of epoch
Ewhile keeping others falls back to per-object lifecycle, defeating the cohort-reclamation property.
Relationship to other ID-encoded properties¶
The pattern is in a family of "encode lifecycle-relevant state in the durable identifier" techniques:
| Pattern | What's encoded | Decision it enables |
|---|---|---|
| Epoch stamp on object ID | Cluster epoch at creation | GC by epoch comparison |
| Snowflake ID | Timestamp + worker ID | Time-ordered ID generation |
| ULID | Timestamp prefix | Lexicographic-time-ordered IDs |
| Time-bucketed S3 prefix | Time bucket | Lifecycle policies by prefix |
| Tenant ID in object key | Tenant | Tenant-scoped operations |
The shared meta-pattern: the ID encodes enough state that operational decisions don't require external lookups.
Seen in¶
- sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection — canonical wiki instance. "We embed in every L0 object ID at creation time." The format isn't disclosed but the recovery property — that the epoch is retrievable from any reference to the L0 object — is load-bearing for the GC mechanism.
Related¶
- concepts/cluster-epoch — the primitive being stamped.
- concepts/epoch-based-distributed-gc — the GC technique this pattern is half of (the stamping half).
- concepts/sliding-window-epoch-tracking — the per-shard state machine that publishes the watermark this pattern's decision compares against.
- patterns/per-partition-rsm-for-gc-tracking — sibling pattern: the local-state half of the GC technique.
- patterns/lazy-aggregate-from-monotonic-local-state — sibling pattern: the global-aggregation half.
- systems/redpanda-cloud-topics — the canonical system instance.