SYSTEM Cited by 1 source

Apache Iceberg v3¶

Apache Iceberg v3 is the third major version of the Apache Iceberg table-format spec, reaching General Availability on Databricks on 2026-05-28. The release adds three load-bearing format-level primitives — deletion vectors, row tracking, and the VARIANT type — each of which closes a previously-painful gap between Iceberg's snapshot semantics and modern lakehouse workloads (incremental updates, change-driven processing, semi-structured data ingestion).

Verbatim positioning from the announcing source:

"With Iceberg v3 now generally available on Databricks, customers get support for deletion vectors, row tracking, and VARIANT across managed Iceberg tables, foreign Iceberg tables, and UniForm-enabled managed tables. These capabilities close important gaps between performance and interoperability: deletion vectors accelerate updates, merges, and deletes; row tracking supports more efficient incremental processing; and VARIANT provides a standard representation for semi-structured data. These features also work seamlessly across both Delta and Iceberg tables, enabling interoperability without rewriting data."

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

The three v3 features¶

Deletion vectors¶

A file-level row-delete representation: instead of rewriting an entire data file when one or more rows are deleted, the table records a small "deletion vector" file that marks the deleted rows in the original file as logically absent. Subsequent reads merge the deletion vector with the data file at query time, skipping the marked rows.

Architecturally this is the merge-on-read family of update strategies — write-amplification is dramatically reduced (a deletion of 100 rows in a 1 GB Parquet file no longer requires rewriting the file) at the cost of a read-side merge step. Compaction processes eventually rewrite the file with the deletions absorbed.

See concepts/deletion-vector for the canonical wiki page.

Row tracking¶

A stable per-row identity that survives table-level rewrites (compaction, schema-evolution, file-rewrite). Without row tracking, an incremental consumer of the table cannot reliably tell "which physical rows changed" across commits because compaction may have rewritten unchanged rows into different files. Row tracking gives every row a stable identifier, making incremental processing pipelines (CDC consumers, materialized-view incremental refresh, ML feature recomputation) substantially more efficient — they can reason about row-level changes without re-scanning everything.

See concepts/row-tracking for the canonical wiki page.

VARIANT type¶

A standard representation for semi-structured data (JSON-shaped values where the schema is open or unknown at write time). VARIANT lets engines store and query JSON-shaped columns natively without flattening the structure into many string columns or pushing the entire payload through an opaque binary blob. It is the OTF analogue of the VARIANT type in Snowflake / Spark / Databricks SQL — but standardised at the Iceberg-spec level so that a VARIANT column written by one engine reads identically in another.

See concepts/variant-type for the canonical wiki page.

Cross-format interoperability¶

The structurally significant claim in the GA announcement is that the same v3 features work seamlessly across both Delta and Iceberg: "These features also work seamlessly across both Delta and Iceberg tables, enabling interoperability without rewriting data." Delta Lake already had file-level deletion vectors and an ANALOG of row tracking; Iceberg v3 brings parity, and the joint Databricks story is that data written through one format can be read through the other (UniForm) with the v3 features intact.

This sets up the further alignment in the same announcement: Iceberg v4 + Delta 5.0 are slated to share an "adaptive metadata tree" structure — see concepts/format-co-evolution-iceberg-delta.

Coverage scope on Databricks¶

The GA disclosure scopes Iceberg v3 features to three table types on Databricks:

Managed Iceberg tables in Unity Catalog (UC owns layout / optimisation).
Foreign Iceberg tables registered in UC (governed via UC, but stored / managed by an external catalog like AWS Glue, Snowflake Horizon, Hive Metastore).
UniForm-enabled managed tables — Delta tables that also expose an Iceberg-compatible read surface; v3 features apply on the Iceberg side.

Caveats¶

Spec-level details deferred to docs. The announcing source describes the role of each v3 feature but does not document on-disk format, compaction-interaction, or compatibility-matrix. Mechanism depth requires the Iceberg spec and Databricks Iceberg-v3 docs.
No quantitative benchmarks disclosed. No latency / write-amp / read-amp / storage-overhead numbers in the announcement.
Interaction with UC ABAC undisclosed. How UC ABAC row filters compose with deletion vectors (does a row filter see the logically-deleted rows? what is the policy-evaluation order?) — not addressed in the announcement.
Engine-side support varies. v3 is GA on Databricks; other engines that read Iceberg (Spark / Trino / Flink / Snowflake / DuckDB) need their own v3-aware readers to consume v3 features. Compatibility-matrix not disclosed in the announcing source.

Seen in¶

sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance — GA announcement. The three v3 features named, scoped to managed / foreign / UniForm-enabled tables, with the cross-format-interoperability claim verbatim. No mechanism depth on any single feature; positioning is "close important gaps between performance and interoperability".

Source¶

Original: https://www.databricks.com/blog/unity-catalog-and-next-era-apache-icebergtm
Spec reference: https://iceberg.apache.org/spec/
Databricks docs: https://docs.databricks.com/aws/en/iceberg/iceberg-v3

systems/apache-iceberg — parent table format.
systems/delta-lake — sibling OTF; v3 features cross-compatible via UniForm.
systems/unity-catalog — managing catalog for v3-enabled tables on Databricks.
concepts/deletion-vector, concepts/row-tracking, concepts/variant-type — the three v3 features as concepts.
concepts/copy-on-write-merge, concepts/merge-on-read — update-strategy taxonomy that deletion vectors slot into.
concepts/format-co-evolution-iceberg-delta — the Iceberg v4 + Delta 5.0 shared-metadata next step.