PATTERN Cited by 1 source

Epoch stamp on object ID for GC¶

Pattern¶

When designing a system that produces temporary objects with unbounded peak counts and a clear "processed everywhere" condition, embed a cluster-global monotonic counter (a cluster epoch) directly into the object's durable identifier at creation time — not as a side- channel attribute, not in a separate metadata store, but in the ID itself. Garbage collection then becomes a simple comparison between the stamp encoded in the ID and a clusterwide safe-to-GC watermark, with no per-object state required to make the deletion decision.

Canonicalised by Redpanda Cloud Topics for L0 file reclamation:

"The cluster epoch is a monotonically increasing counter that we embed in every L0 object ID at creation time." (Source: sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection)

Mechanics¶

Stamping (creation):

   epoch = current_cluster_epoch()        ← from local cache;
                                            slightly stale OK
   obj_id = encode(⟨epoch, suffix⟩)        ← e.g. "1234/abc-xyz"
                                              or epoch in path prefix
   write_object(obj_id, payload)

Reclamation (decision is local, no per-object metadata):

   M = current_cluster_safe_to_gc()        ← from gossip / metadata
                                              dissemination
   for obj in list_objects(prefix):
     ⟨e, _⟩ = decode(obj.id)
     if e ≤ M:
       delete(obj)

Three properties make this work:

Stamping is at creation, immutable. Once written, the object's epoch is fixed. The ID itself encodes the GC-relevant state.
Decision is local-only. Deciding whether to delete needs only (obj.epoch, M) — both readable without any cross-partition lookup.
No per-object metadata table. Compare to reference-counting, which requires a per-object durable count somewhere.

Stamp placement choices¶

The pattern is silent on where in the ID the epoch goes, but the practical choices have operational consequences:

Placement	Pros	Cons
Path prefix (e.g. `/epoch=1234/obj.bin`)	Listing by epoch is a directory scan; lifecycle policies can target by prefix	Fixed taxonomy; no flat layout
ID prefix bytes (e.g. `1234-abc-xyz`)	Prefix listing groups by epoch; path-agnostic	Sortable but coarse
Embedded structured field	Explicit, parseable; allows tooling	Requires decode step
Side-channel attribute	Maximum flexibility	Loses the "in the ID itself" property — needs metadata store

The Cloud Topics post does not disclose its specific format, only that the epoch is "in every L0 object ID." For object stores with prefix-listing-as-cheap (S3, GCS, ADLS), placing the epoch in the path prefix gives free epoch-grouped listing.

Why "in the ID itself" matters¶

The pattern's load-bearing property is that the epoch is recoverable from any reference to the object:

A storage path (s3://bucket/epoch=1234/...) — recover by parsing.
A metadata pointer (e.g. a placeholder batch in a Raft log) — recover by decoding.
A listing call against object storage — recover from each returned key.

This means there is no situation where the system holds a reference to the object but doesn't know its epoch. Compare a side-channel attribute (e.g. an S3 object tag): if the attribute is lost or the metadata store is unavailable, GC stalls. The in-ID stamp is always retrievable.

Listing-driven deletion sweep¶

The pattern composes naturally with a deletion sweep that periodically lists objects, reads M, and deletes everything with epoch ≤ M. No per-object tracking, no scheduling table, no cursor — the listing is the cursor, the decision is local, the sweep is idempotent (re-listing after a crash gives the same answer with at most a smaller M).

The Cloud Topics post forward-references this: "Stay tuned for part 2, where we discuss how the garbage collector's design enables us to continually delete thousands of L0 objects without any locally persistent state, explicit coordination, or wasted work." The "without any locally persistent state" property is made possible by the in-ID stamp.

Compatibility constraints¶

The pattern works cleanly when:

Object storage supports listing. S3, GCS, ADLS all do. Raw block storage does not.
The epoch namespace is monotonic and durable. A cluster-wide counter advanced by some cluster-epoch mechanism, not a per-broker local counter that could go backwards.
The reclamation decision can tolerate epoch granularity. All objects in epoch E are reclaimed together when E falls below M. If finer-grained per-object lifecycle is required, compose with per-object policies (e.g. retention overrides for specific objects).

Anti-patterns¶

Side-channel epoch tag instead of in-ID stamp. Shifts the "recovery from any reference" property to the metadata store, defeating the always-available promise.
Per-broker monotonic counter (not cluster-global). Reintroduces the cross-broker coordination the pattern was designed to avoid: now M(p) would have to be a 2-D quantity M(p, broker).
Stamping later than creation. A two-phase "create-then-stamp" introduces a window in which an object exists without an epoch, defeating the recovery property.
Mutable epoch. Updating an existing object's epoch reintroduces coordinated update mechanics — the whole point of the technique was to avoid this.
Sub-epoch granularity reclamation. Trying to reclaim some objects of epoch E while keeping others falls back to per-object lifecycle, defeating the cohort-reclamation property.

Relationship to other ID-encoded properties¶

The pattern is in a family of "encode lifecycle-relevant state in the durable identifier" techniques:

Pattern	What's encoded	Decision it enables
Epoch stamp on object ID	Cluster epoch at creation	GC by epoch comparison
Snowflake ID	Timestamp + worker ID	Time-ordered ID generation
ULID	Timestamp prefix	Lexicographic-time-ordered IDs
Time-bucketed S3 prefix	Time bucket	Lifecycle policies by prefix
Tenant ID in object key	Tenant	Tenant-scoped operations

The shared meta-pattern: the ID encodes enough state that operational decisions don't require external lookups.

Seen in¶

sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection — canonical wiki instance. "We embed in every L0 object ID at creation time." The format isn't disclosed but the recovery property — that the epoch is retrievable from any reference to the L0 object — is load-bearing for the GC mechanism.

concepts/cluster-epoch — the primitive being stamped.
concepts/epoch-based-distributed-gc — the GC technique this pattern is half of (the stamping half).
concepts/sliding-window-epoch-tracking — the per-shard state machine that publishes the watermark this pattern's decision compares against.
patterns/per-partition-rsm-for-gc-tracking — sibling pattern: the local-state half of the GC technique.
patterns/lazy-aggregate-from-monotonic-local-state — sibling pattern: the global-aggregation half.
systems/redpanda-cloud-topics — the canonical system instance.