Skip to content

CONCEPT Cited by 1 source

Format co-evolution (Iceberg v4 / Delta 5.0)

Format co-evolution is the architectural direction in which two open table formats — Apache Iceberg and Delta Lakealign on a shared core data structure for metadata rather than evolving in parallel along separate spec tracks. The forward-looking disclosure in the 2026-05-28 announcement is that Iceberg v4 will introduce an "adaptive metadata tree" structure, and Delta 5.0 will adopt the same structure.

"With Iceberg v4, we are rethinking the core metadata structure from the ground up for better performance, scalability, and interoperability. Our goal is to continuously raise the bar for performance and feature innovation, and to do so in a way that brings Iceberg and Delta Lake closer together. This is why we are also proposing that the next version of Delta, Delta 5.0, adopts the adaptive metadata tree structure."

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

Why this is architecturally significant

Iceberg and Delta have historically been competing OTFs with overlapping but distinct internals — different snapshot layouts, different manifest structures, different commit protocols. The market arrangement was "choose one and commit"; bridges like UniForm provided cross-format read compatibility but the underlying writes still followed the chosen format's spec.

Format co-evolution is a different theory. If both formats share the same core metadata data structure (the "adaptive metadata tree"), then:

  1. A reader implementation written against the shared structure works for both formats. Engines reduce per-format reader code; ecosystem fragmentation drops.
  2. Format-level features that depend on the metadata structure can ship simultaneously to both formats. The 2026-05-28 Iceberg v3 release already demonstrates this with deletion vectors, row tracking, and VARIANT — Delta had analogues; Iceberg v3 brought parity. Future co-evolution makes this the default rather than a one-off catch-up.
  3. The format choice becomes operational, not strategic. Customers choose between Iceberg and Delta based on ecosystem / tooling / vendor preference rather than feature gaps; the underlying capability surface converges.
  4. The OTF battle becomes mostly settled. Two formats with shared metadata internals plus catalog interoperability (Iceberg REST, Delta Sharing now bi-format) is functionally a single open lakehouse layer with two skins.

Why "adaptive"

The 2026-05-28 announcement names the structure as "adaptive metadata tree" but doesn't describe its mechanism. Likely candidates for what adaptive means in this context (from the announcement's stated goals — "better performance, scalability, and interoperability"):

  • Adaptive depth / branching based on table size — small tables stay shallow; large tables grow deeper hierarchies — to keep manifest-fetch cost roughly constant across scales.
  • Adaptive partitioning of metadata — metadata files are split by predicate-relevance so a query touching one partition fetches only the metadata for that partition.
  • Adaptive update propagation — small commits update local subtrees without rewriting the global manifest list.

The structure is presumably a B-tree or similar self-balancing tree at the metadata layer rather than the flat manifest-list-of-manifests design that Iceberg historically used and Delta's _delta_log JSON-line pattern. Specific design TBD via the spec.

Caveats

  • Forward-looking only. As of the 2026-05-28 announcement, the adaptive metadata tree is a proposal, not a shipping spec. No Iceberg v4 spec, no Delta 5.0 spec, no benchmarks, no commitment date.
  • No mechanism description. The announcement names the structure but does not describe its design. The deferred-internals reference is the conference session "Format Co-Evolution: How Iceberg v4 and Delta 5.0 Share a Unified Metadata".
  • Vendor-driven proposal. Databricks is the vendor announcing the alignment; whether the broader Iceberg community adopts the same structure on Iceberg's own timeline is undisclosed.
  • Doesn't address Hudi. Apache Hudi, the third major OTF, is not part of the disclosed co-evolution direction.
  • Migration path for existing tables undisclosed. How tables on the v2/v3 metadata structure migrate to the adaptive metadata tree is not addressed.
  • Wiki ingest as directional concept. This page captures the architectural intent; deeper detail will land when the spec drops.

Seen in

Last updated · 542 distilled / 1,571 read