Skip to content

CONCEPT Cited by 1 source

Iceberg snapshot expiry

Iceberg snapshot expiry is the metadata-garbage-collection loop that prunes old Iceberg snapshots (and the manifest + data files they uniquely reference) as the table accumulates new snapshots over time. It is the operational counterpart to Iceberg's snapshot-versioning model: every mutation produces a new snapshot, and without bounded expiry the metadata layer grows unboundedly.

Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available canonicalises this loop as a first-class broker-owned feature on Redpanda Iceberg Topics at GA.

Why it's needed

Iceberg's append-only snapshot model means:

  • Every write (insert, update, delete, MERGE INTO) produces a new snapshot pointer + manifest files.
  • Old snapshots remain valid queryable targets for time-travel reads until explicitly expired.
  • On a high-throughput streaming topic writing Iceberg snapshots every few seconds, metadata growth dominates: thousands of snapshots per hour, each with its own manifest file references, rapidly making catalog metadata unmanageable.
  • Data files uniquely referenced only by expired snapshots become garbage — candidates for deletion to reclaim object-storage cost.

Without snapshot expiry, three costs compound:

  1. Catalog metadata size — REST catalogs hold snapshot lists in memory; unbounded growth degrades catalog performance.
  2. Reader planning cost — readers walk the snapshot chain to resolve the current state; a long chain inflates query-plan time.
  3. Object-storage bill — obsolete data files sit in the bucket paying for themselves until explicitly deleted.

What it does

An Iceberg snapshot-expiry pass:

  1. Selects snapshots to expire — typically by age (older than N hours / days) or by count (keep the last N snapshots).
  2. Updates the catalog with the new snapshot list (current + retained).
  3. Identifies orphaned files — manifest files and data files no longer referenced by any retained snapshot.
  4. Deletes the orphaned files from object storage.

Steps 1–2 are metadata-layer updates (cheap, transactional via the catalog's commit protocol). Steps 3–4 are object-storage- layer deletes (potentially large batch operations; often backgrounded).

Externalisation-cost perspective

Pre-GA Iceberg integrations (including the 2025-01-21 pedagogy disclosure of Iceberg Topics) left snapshot expiry as an externalised operational burden: someone had to run a Spark / Flink / custom job that called Iceberg's ExpireSnapshots action on a schedule. Wiki commentary on concepts/iceberg-topic and systems/redpanda-iceberg-topics explicitly flagged this as an open question:

"Compaction + GC ownership unclear from the pedagogy post. Iceberg's periodic small-file compaction + snapshot garbage collection are operational burdens someone must own — the broker, the REST catalog, or the customer. The Medallion Architecture post doesn't specify." (Source: sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpanda)

The 25.1 GA release closes this gap for Iceberg Topics by internalising snapshot expiry as a broker-native feature:

"Snapshot expiry that performs automatic housekeeping of old snapshot information within table metadata to manage metadata size over time." (Source: sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-available)

This is the broker-as-operator move: instead of the customer running a scheduled cleanup job, the streaming broker owns the lifecycle of its Iceberg output including the GC loop.

What this does not cover

Snapshot expiry is one of two metadata-layer operational loops on Iceberg; the other is small-file compaction. Snapshot expiry prunes old metadata and data; compaction merges small current data files into larger ones to recover scan performance. The 25.1 GA post explicitly names snapshot expiry as broker-owned; it does not explicitly name compaction as broker-owned. The compaction ownership question remains open on the Iceberg Topics surface as of this post — customers may still need to run compaction via a separate Spark / Flink job, or accept the small-file overhead that the tunable workload-management lag floor mitigates partially.

Trade-offs

  • Retention window as an operator knob. Expiry is useful only if the retention window matches the longest acceptable time-travel query horizon. Too short → can't query yesterday's snapshot. Too long → metadata grows, costs grow.
  • Broker-owned vs customer-owned. Internalising expiry is a tenant-simplification win but removes operator override (no way to pin a specific snapshot for audit purposes unless the broker exposes a "retain this snapshot" hook).
  • Concurrent-writer interaction. If the Iceberg table is written by multiple producers (the Redpanda broker + a batch Spark job + a Flink job), snapshot expiry must coordinate with all of them — expiring a snapshot that another writer's commit depended on breaks consistency. The REST catalog's optimistic-concurrency commit protocol is the usual serialisation primitive.

Seen in

Last updated · 470 distilled / 1,213 read