CONCEPT Cited by 1 source
VARIANT type¶
VARIANT is a column-level data type that stores semi-structured (JSON-shaped) values with their internal structure preserved and queryable, without requiring the column's schema to be known at write time. It is the OTF analogue of the long-established VARIANT types in Snowflake, Databricks SQL, and Apache Spark — but standardised at the table-format spec level so that values written by one engine can be read identically by another.
A VARIANT column can hold:
- A JSON object (
{"foo": 1, "bar": "x", "nested": {"a": [1, 2, 3]}}). - A JSON array.
- A JSON scalar (number, string, boolean, null).
- Any mixture row-to-row — the type is genuinely heterogeneous.
The engine can query into VARIANT values with path expressions (event:user.id, payload['items'][0]['sku']) and project / cast values out of the structure.
Why it matters¶
VARIANT closes a structural gap in lakehouse table formats: how to ingest semi-structured event streams without flattening or stringifying them at write time.
Before VARIANT, the typical patterns were:
- Flatten at write time. Pre-define a wide schema with one column per JSON field, parse the JSON at ingest, and discard fields not in the schema. Loses fidelity; requires schema-evolution every time a new field appears upstream.
- Store as STRING and re-parse at query time. Lossless, but the engine cannot push down predicates or column projections into the JSON at query plan time — every read has to parse the full string. Slow at scale.
- Store as nested STRUCT with a fixed schema. Same loss-of-fidelity problem as flattening; doesn't handle heterogeneous payloads.
- Store as binary BLOB with custom serialisation. Fully opaque to the engine; cannot index, query, or analyse without a custom UDF round-trip.
VARIANT replaces all four with a first-class engine-aware type that:
- Preserves the full JSON structure losslessly.
- Allows path-based queries that the engine can plan and optimise.
- Supports schema-on-read — fields can be cast / projected into typed columns lazily.
- Doesn't require schema evolution every time upstream payloads change shape.
Compare concepts/json-column-as-schema-escape-hatch for the architectural pattern VARIANT formalises.
Iceberg v3 disclosure¶
Iceberg v3 adds VARIANT to the spec. From the 2026-05-28 announcement:
"VARIANT provides a standard representation for semi-structured data."
Cross-format point: "These features also work seamlessly across both Delta and Iceberg tables, enabling interoperability without rewriting data." (Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)
The architectural significance is standardisation across formats and engines. Snowflake's VARIANT, Spark's VARIANT, and Databricks SQL's VARIANT have been format-internal until now; v3 brings the type to the open Iceberg spec, so a VARIANT column written by an Iceberg writer is portable across engines that read v3.
Caveats¶
- On-disk encoding deferred to spec. The announcing source does not document the binary encoding of VARIANT values; refer to the Iceberg v3 spec and Databricks docs for details.
- Engine-side support varies. v3 VARIANT is GA on Databricks; other Iceberg engines need v3-aware readers and a VARIANT-aware query planner. Compatibility-matrix not disclosed.
- No quantitative numbers on storage overhead, query-path overhead, or compaction interaction.
- Indexing strategy undisclosed. Whether VARIANT columns can be indexed (analogous to PostgreSQL's GIN-on-JSONB or Snowflake's micro-partition statistics on VARIANT fields) is not addressed in the announcing source.
Seen in¶
- sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance — GA announcement of Iceberg v3 VARIANT on Databricks. Named alongside deletion vectors and row tracking as one of three v3 primitives; cross-format compatibility with Delta disclosed; no mechanism depth.
Related¶
- systems/iceberg-v3 — v3 milestone introducing VARIANT to the Iceberg spec.
- systems/apache-iceberg — parent table format.
- systems/delta-lake — sibling format; cross-compatible VARIANT.
- systems/snowflake — pre-existing VARIANT-shaped type that v3 standardises across engines.
- concepts/schema-evolution — the cost VARIANT removes from semi-structured-data ingestion.
- concepts/json-column-as-schema-escape-hatch — the architectural pattern VARIANT formalises in OTFs.
- concepts/open-table-format — umbrella concept.