Skip to content

PATTERN Cited by 1 source

Epoch stamp on object ID for GC

Pattern

When designing a system that produces temporary objects with unbounded peak counts and a clear "processed everywhere" condition, embed a cluster-global monotonic counter (a cluster epoch) directly into the object's durable identifier at creation time — not as a side- channel attribute, not in a separate metadata store, but in the ID itself. Garbage collection then becomes a simple comparison between the stamp encoded in the ID and a clusterwide safe-to-GC watermark, with no per-object state required to make the deletion decision.

Canonicalised by Redpanda Cloud Topics for L0 file reclamation:

"The cluster epoch is a monotonically increasing counter that we embed in every L0 object ID at creation time." (Source: sources/2026-05-19-redpanda-cloud-topics-level-zero-garbage-collection)

Mechanics

Stamping (creation):

   epoch = current_cluster_epoch()        ← from local cache;
                                            slightly stale OK
   obj_id = encode(⟨epoch, suffix⟩)        ← e.g. "1234/abc-xyz"
                                              or epoch in path prefix
   write_object(obj_id, payload)

Reclamation (decision is local, no per-object metadata):

   M = current_cluster_safe_to_gc()        ← from gossip / metadata
                                              dissemination
   for obj in list_objects(prefix):
     ⟨e, _⟩ = decode(obj.id)
     if e ≤ M:
       delete(obj)

Three properties make this work:

  1. Stamping is at creation, immutable. Once written, the object's epoch is fixed. The ID itself encodes the GC-relevant state.
  2. Decision is local-only. Deciding whether to delete needs only (obj.epoch, M) — both readable without any cross-partition lookup.
  3. No per-object metadata table. Compare to reference-counting, which requires a per-object durable count somewhere.

Stamp placement choices

The pattern is silent on where in the ID the epoch goes, but the practical choices have operational consequences:

Placement Pros Cons
Path prefix (e.g. /epoch=1234/obj.bin) Listing by epoch is a directory scan; lifecycle policies can target by prefix Fixed taxonomy; no flat layout
ID prefix bytes (e.g. 1234-abc-xyz) Prefix listing groups by epoch; path-agnostic Sortable but coarse
Embedded structured field Explicit, parseable; allows tooling Requires decode step
Side-channel attribute Maximum flexibility Loses the "in the ID itself" property — needs metadata store

The Cloud Topics post does not disclose its specific format, only that the epoch is "in every L0 object ID." For object stores with prefix-listing-as-cheap (S3, GCS, ADLS), placing the epoch in the path prefix gives free epoch-grouped listing.

Why "in the ID itself" matters

The pattern's load-bearing property is that the epoch is recoverable from any reference to the object:

  • A storage path (s3://bucket/epoch=1234/...) — recover by parsing.
  • A metadata pointer (e.g. a placeholder batch in a Raft log) — recover by decoding.
  • A listing call against object storage — recover from each returned key.

This means there is no situation where the system holds a reference to the object but doesn't know its epoch. Compare a side-channel attribute (e.g. an S3 object tag): if the attribute is lost or the metadata store is unavailable, GC stalls. The in-ID stamp is always retrievable.

Listing-driven deletion sweep

The pattern composes naturally with a deletion sweep that periodically lists objects, reads M, and deletes everything with epoch ≤ M. No per-object tracking, no scheduling table, no cursor — the listing is the cursor, the decision is local, the sweep is idempotent (re-listing after a crash gives the same answer with at most a smaller M).

The Cloud Topics post forward-references this: "Stay tuned for part 2, where we discuss how the garbage collector's design enables us to continually delete thousands of L0 objects without any locally persistent state, explicit coordination, or wasted work." The "without any locally persistent state" property is made possible by the in-ID stamp.

Compatibility constraints

The pattern works cleanly when:

  1. Object storage supports listing. S3, GCS, ADLS all do. Raw block storage does not.
  2. The epoch namespace is monotonic and durable. A cluster-wide counter advanced by some cluster-epoch mechanism, not a per-broker local counter that could go backwards.
  3. The reclamation decision can tolerate epoch granularity. All objects in epoch E are reclaimed together when E falls below M. If finer-grained per-object lifecycle is required, compose with per-object policies (e.g. retention overrides for specific objects).

Anti-patterns

  • Side-channel epoch tag instead of in-ID stamp. Shifts the "recovery from any reference" property to the metadata store, defeating the always-available promise.
  • Per-broker monotonic counter (not cluster-global). Reintroduces the cross-broker coordination the pattern was designed to avoid: now M(p) would have to be a 2-D quantity M(p, broker).
  • Stamping later than creation. A two-phase "create-then-stamp" introduces a window in which an object exists without an epoch, defeating the recovery property.
  • Mutable epoch. Updating an existing object's epoch reintroduces coordinated update mechanics — the whole point of the technique was to avoid this.
  • Sub-epoch granularity reclamation. Trying to reclaim some objects of epoch E while keeping others falls back to per-object lifecycle, defeating the cohort-reclamation property.

Relationship to other ID-encoded properties

The pattern is in a family of "encode lifecycle-relevant state in the durable identifier" techniques:

Pattern What's encoded Decision it enables
Epoch stamp on object ID Cluster epoch at creation GC by epoch comparison
Snowflake ID Timestamp + worker ID Time-ordered ID generation
ULID Timestamp prefix Lexicographic-time-ordered IDs
Time-bucketed S3 prefix Time bucket Lifecycle policies by prefix
Tenant ID in object key Tenant Tenant-scoped operations

The shared meta-pattern: the ID encodes enough state that operational decisions don't require external lookups.

Seen in

Last updated · 542 distilled / 1,571 read