Skip to content

SYSTEM Cited by 14 sources

Apache Iceberg

Apache Iceberg is an open table format that sits above columnar data files (typically systems/apache-parquet) on an object store to provide a table abstraction — atomic row-level updates, schema evolution, hidden partitioning, and snapshot-based versioning — over storage that is fundamentally immutable and object-scoped. Originally open-sourced in 2017.

Why it exists (per the S3-at-19 post)

Parquet became the de-facto tabular on-object format in the early 2010s, combined with Hadoop/Hive for "data lake" tables. But customer demand moved beyond append-only scans to:

  • Row-level insert/update without rewriting entire tables.
  • Schema evolution (add/remove/rename columns without migration).
  • Time travel / versioning — query the state of the table at a specific point in the past.

These are hard to achieve directly over immutable objects — see concepts/immutable-object-storage.

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

How Iceberg solves it

A metadata layer over the data objects: each logical table version is a snapshot that describes the set of data files making up the table at that moment. Mutations produce a new snapshot that references mostly-unchanged files plus the small delta; consumers read by resolving the current snapshot to its file set.

Result:

  • Small updates don't require rewriting the whole table.
  • The table is implicitly versioned — stepping backward/forward in time is reading an older/newer snapshot pointer.
  • Snapshots are the atomic unit, which gives databases the transactional semantics they expect.

Externalisation cost (the gap S3 Tables closes)

Because Iceberg's structure is externalised — customer code owns the data/metadata object relationships — several burdens fall on the customer:

  • Compaction to merge small snapshot deltas into larger files and recover scan performance.
  • Garbage collection to reclaim space from superseded snapshot files.
  • Tiering-policy awareness — S3 Intelligent-Tiering doesn't know about Iceberg's logical layout, so tiering can misbehave.

Andy Warfield (2025):

"Iceberg and other open table formats like it are effectively storage systems in their own right, but because their structure is externalised – customer code manages the relationship between iceberg data and metadata objects, and performs tasks like garbage collection – some challenges emerge."

This gap is what motivated systems/s3-tables — S3 absorbing compaction / GC / tiering as managed operations, and exposing the table itself as the first-class policy resource.

Ecosystem

  • Iceberg REST Catalog (IRC) API — standard catalog protocol Iceberg clients speak; S3 Tables added IRC support within 14 weeks of launch.
  • DuckDB Iceberg — collaboration called out in the 2025 S3 post to accelerate in-engine Iceberg reads.
  • Native readers/writers in Spark, Flink, Trino, Snowflake, Presto, and others.

Seen in

  • sources/2026-05-21-yelp-how-partition-access-visualizations-reduced-our-data-lake-s3-cost-by-33 — Iceberg as the migration target prioritised by usage data. Yelp's data-platform team set out "to identify active tables and partitions to prioritize our adoption of Apache Iceberg" — the granular usage attribution effort that ultimately drove 33% S3 cost reduction was originally an Iceberg-migration- prioritisation tool. Canonical wiki instance of patterns/usage-driven-migration-prioritization applied to Iceberg adoption: "focus our migration efforts on active tables and partitions that would add the most customer value. This enabled the team to provide Apache Iceberg's read performance benefits to the most valuable use cases first." No table-count or migration-speed-up numbers disclosed.
  • sources/2026-01-06-redpanda-build-a-real-time-lakehouse-architecture-with-redpanda-and-databricksNetflix-origin disclosure at Databricks-speaker altitude. Jason Reed (Databricks, formerly on Netflix's data team) is the cited architectural voice on Iceberg's origin verbatim: "these challenges ultimately led to the creation of Apache Iceberg— initially developed internally at Netflix and later open-sourced as an Apache project" and on its architectural position: "Iceberg provides a foundation that looks and behaves like a warehouse table, while remaining open and cloud-native." Also traces the catalog-shape evolution: "Early Iceberg catalogs were often implemented as collections of files stored directly in object storage. But with many users, workloads, and vendors creating and managing Iceberg tables across shared object storage, metadata sprawl and governance gaps were becoming dire." — setting the stage for the REST catalog era. Joint- vendor framing with Redpanda's Iceberg Topics as the streaming-broker producer + Unity Catalog as the governance endpoint. No mechanism depth beyond prior wiki coverage.
  • sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes — Iceberg's design, what it externalises, and the motivation for pulling it into S3 as a first-class construct.
  • sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — Iceberg on AWS Glue as the Mercedes-Benz after-sales source of truth; systems/unity-catalog federates the Iceberg tables so they can be shared out as Delta via systems/delta-sharing without rewriting into Delta first — a concrete case of OTF→OTF translation at the catalog/federation boundary.
  • sources/2026-04-07-allthingsdistributed-s3-files-and-the-changing-face-of-s3 — Warfield's S3 Files launch essay positions Iceberg as the structural-data precursor that led to the platform's patterns/presentation-layer-over-storage thesis: "Iceberg was clearly helping lift the level of abstraction for tabular data on S3, [but] it also still carried a set of sharp edges because it was having to surface tables strictly over the object API." Customer scale cited: over 2M tables stored in systems/s3-tables today, off the back of the Iceberg installed base.
  • sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — named as one of the three open table formats Amazon BDT's Ray compactor (contributed to systems/deltacat) is designed to extend to, alongside systems/apache-hudi and systems/delta-lake. The post also recognises Iceberg (and Hudi) as having canonicalised the "copy-on-write merge" name for the compaction strategy Amazon BDT was already running in-house in 2019 over its exabyte-scale catalog — years before publishing the shared vocabulary.

Row-level update surfaces: MERGE INTO vs INSERT OVERWRITE

Iceberg exposes two SQL surfaces for updating table state, and they operate at very different granularities:

  • INSERT OVERWRITE — replaces an entire partition (or table). Strict schema requirements (column names, types, and order must match). Good for batch full-partition refreshes; a footgun for targeted row-level updates because it rewrites orders of magnitude more data than necessary.
  • MERGE INTO — conditional row-level upsert/delete/insert against a matching condition; only touches affected rows; more flexible schema matching. Canonical fit for CDC ingest, slowly-changing dimensions, and incremental merges.

Underneath MERGE INTO, Iceberg offers two row-level update strategies:

  • Copy-on-Write (COW) — rewrite the entire data file for any row change. Strong consistency, immediate visibility, resource- intensive. See concepts/copy-on-write-merge.
  • Merge-on-Read (MOR) — write updates as separate delta files merged with base data at query time. Optimizes write performance; requires periodic compaction to keep the read side fast. See concepts/merge-on-read.

Operational prescription (patterns/merge-into-over-insert-overwrite): default to MERGE INTO over MOR for incremental workloads; reserve INSERT OVERWRITE for genuine full-partition rewrites. The load-bearing caveat is that MOR-backed MERGE INTO only stays fast if an operator runs periodic copy-on-write compaction (Iceberg's rewrite_data_files action, or — at exabyte scale — systems/deltacat).

(Source: sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite)

Seen in (additional)

  • sources/2025-09-30-expedia-prefer-merge-into-over-insert-overwrite — Expedia Group Tech primer naming MERGE INTO (row-level, MOR) as the default for Iceberg incremental updates and INSERT OVERWRITE (partition-level) as the legacy blunt instrument; qualitative trade-offs only, no numbers. Source for concepts/merge-on-read and patterns/merge-into-over-insert-overwrite on this wiki.
  • sources/2025-11-04-datadog-replication-redefined-multi-tenant-cdc-platform — Iceberg as one of the downstream sink destinations in Datadog's managed multi-tenant CDC platform. Postgres → Iceberg pipelines are cited for "scalable, event-driven analytics" — one of five sink classes generalised from the original Postgres-to-search seed pipeline. No Iceberg-internals detail in the post (the reference is to Apache Flink CDC's Iceberg connector); value is the data point that CDC-to-Iceberg is a first-class replication shape at Datadog scale alongside CDC-to-search and DB-to-DB replication.
  • sources/2026-02-24-pinterest-piqama-pinterest-quota-management-ecosystemtelemetry + billing substrate for a quota-management platform. Pinterest's Piqama lands pre-aggregated quota-enforcement and usage statistics in Iceberg on S3; a separate auto-rightsizing service consumes the Iceberg data (via Presto or Iceberg directly or user-defined sources) to periodically recompute quota values against organic-growth / burst / underutilization strategies, then writes them back via the Piqama control-plane API. Also the governance source for the budget-exceedance → X% quota haircut loop. Canonical wiki instance of Iceberg-as-telemetry-and-billing-substrate, distinct from the table-format / CDC-sink / MERGE INTO instances. Pre- aggregation at write time is the explicit design choice keeping storage cost bounded despite fleet-wide telemetry volume.

  • sources/2025-01-21-redpanda-implementing-the-medallion-architecture-with-redpandaIceberg as the lakehouse table format underneath the Medallion Architecture, sunk into directly by a streaming broker. Redpanda's Iceberg topics make a single logical entity addressable as both a Kafka-protocol topic and an Iceberg table — the broker projects row-oriented records into Parquet, writes to object storage, and registers the snapshot with an external Iceberg REST catalog (Databricks Unity, Snowflake Polaris). Downstream engines (ClickHouse, Snowflake, Databricks, Dremio) query the tables directly. Canonical wiki datum for Iceberg-as-streaming-broker-Bronze-sink (patterns/streaming-broker-as-lakehouse-bronze-sink) — distinct from the table-format / CDC-sink / MERGE INTO / telemetry-substrate instances — and for Iceberg's Flink sink connector enabling streaming Bronze→Silver→Gold transitions (patterns/stream-processor-for-real-time-medallion-transitions). Pedagogy altitude; no latency / throughput / commit-cadence / compaction-ownership numbers — the last of which (Iceberg's externalisation-cost question) is specifically elided in the pedagogy framing.

  • sources/2025-04-07-redpanda-251-iceberg-topics-now-generally-availableGA release disclosure for the streaming-broker-as-Iceberg- writer shape (Redpanda 25.1, multi-cloud on AWS/Azure/GCP). Extends the 2025-01 Iceberg-Topics pedagogy entry above with four GA-grade table-management capabilities (custom hierarchical bucketed partitioning, built-in DLQ for invalid records, full Iceberg-spec-compliant schema evolution, automatic snapshot expiry) and five catalog-integration capabilities (secure REST catalog sync via OIDC+TLS against Snowflake Open Catalog / Apache Polaris / Databricks Unity / AWS Glue, transactional writes via the commit- protocol, automatic table discovery, built-in object-store catalog fallback, tunable workload management knob for the snapshot-vs-live-topic lag ceiling). The GA disclosure internalises snapshot expiry as a broker-owned loop — partially closing the wiki's prior-ingest externalisation-cost caveat (snapshot-expiry ownership resolved; small-file compaction ownership still open as of this post). Also canonicalises patterns/broker-native-iceberg-catalog-registration as the mechanism behind the "zero-ETL" streaming-to-lakehouse integration framing.

  • sources/2025-06-24-redpanda-why-streaming-is-the-backbone-for-ai-native-data-platformsIceberg as the open-format escape hatch from warehouse lock-in. Redpanda's backbone essay frames Iceberg as the mechanism that lets a single logical dataset serve both a proprietary warehouse (Snowflake) and an alternative compute engine (BigQuery, model-serving / training infrastructure) "without having to store your data twice". Names Apache Polaris explicitly as the REST-catalog choice that keeps metadata open alongside data. Positions Iceberg-Topics vs Snowpipe Streaming as the open-format-vs-proprietary-format ingestion decision. Thought-leadership altitude; no numbers; retires the warehouse-lock-in-as-inevitable framing from vendor marketing.

  • sources/2026-05-28-cloudflare-how-we-built-cloudflares-data-platform-and-an-ai-agent-on-top-of-itIceberg is the table format underlying Cloudflare R2 Data Catalog, the cold/warm tier of Town Lake. Cloudflare picks Iceberg specifically for "schema evolution, time travel, partition evolution, and the ability to compact data as it ages" — operationalised as a recency-tiered recompaction pipeline: per-minute usage from last week → hourly from last quarter → daily beyond. "The storage cost decreases as recency does, while the data stays queryable. Parquet files in R2 are much cheaper compared to keeping the same data in an OLAP database." This makes Town Lake the first canonical wiki instance of a vendor-managed-Iceberg-on-vendor-object-store shape (i.e., R2 Data Catalog as a managed Iceberg service on R2, paralleling AWS S3 Tables on S3 and Databricks Unity-managed Iceberg) — with the architectural role of the Iceberg layer as the engine-agnostic substrate that lets Trino query R2 today and R2 SQL (Cloudflare's serverless analytics query engine) take over as it matures.

Iceberg v3 GA on Databricks (2026-05-28)

The 2026-05-28 announcement marks the General Availability of Iceberg v3 on Databricks, with three new format-level primitives applying across managed Iceberg, foreign Iceberg, and UniForm-enabled managed tables:

  • Deletion vectors — file-level row-delete representation; "accelerate updates, merges, and deletes" without rewriting whole data files. The merge-on-read family applied to deletes, reducing write amplification dramatically for sparse-delete workloads.
  • Row tracking — stable per-row identity that survives compaction and rewrite; the substrate primitive for "more efficient incremental processing" (CDC consumers, materialized-view incremental refresh, ML feature recomputation no longer over-emit on no-op compaction commits).
  • VARIANT type"a standard representation for semi-structured data" (JSON-shaped values with internal structure preserved and queryable, without requiring schema-at-write-time).

Cross-format claim: "these features also work seamlessly across both Delta and Iceberg tables, enabling interoperability without rewriting data." This positions Iceberg v3 as the parity-with-Delta milestone for the three primitives Delta had earlier; the canonical wiki page is systems/iceberg-v3.

Forward-looking disclosure: Iceberg v4 (the next major version) will introduce an "adaptive metadata tree" core metadata structure, and Delta 5.0 will adopt the same structure — see concepts/format-co-evolution-iceberg-delta. The market direction is two open table formats sharing core metadata internals, with catalog-side bridges (Iceberg REST + Delta Sharing now bi-format) making the format choice operational rather than strategic.

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

Catalog-side primitives (Unity Catalog Iceberg-native, 2026-05-28)

Beyond the v3 format release, the same announcement consolidates a catalog-side surface area that turns Unity Catalog into a fully Iceberg-native catalog:

  • Managed Iceberg (GA) — UC creates / reads / writes / governs Iceberg tables directly, with Predictive Optimization and Liquid Clustering applying to Iceberg-format tables (extending the external-engine- write-to-managed-table shape canonicalised on the wiki for Delta to Iceberg as the format).
  • Foreign Iceberg (GA) + Credential Vending for Foreign Iceberg (GA) — UC governs Iceberg tables managed in external catalogs (AWS Glue, Snowflake Horizon, Hive Metastore, Apache Polaris, Salesforce Data Cloud, Google Cloud Lakehouse, Palantir, Workday) while leaving data in place; mints short-lived scoped credentials for federated access. See concepts/foreign-iceberg-table.
  • External Sharing to Iceberg clients (GA) Delta Sharing now emits Iceberg REST endpoints; recipients on Snowflake / Trino / Flink / Spark consume shared data via Iceberg-compatible clients without ingestion or copies.
  • Cross-engine ABAC (Beta) — UC ABAC policies evaluate during server-side scan planning via the Iceberg REST Catalog Scan Planning API (Iceberg 1.11). The catalog returns a filtered scan plan; the engine reads only authorised data. Compatible engines: Spark / DuckDB / any engine implementing the Iceberg-1.11 scan-planning client. See concepts/cross-engine-abac + patterns/scan-planning-as-policy-enforcement-point.
  • Iceberg-compatible materialized views (Gated Public Preview) — managed MVs exposed downstream as native Iceberg tables; forward-looking syntax CREATE MATERIALIZED VIEW my_mv USING ICEBERG.

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

Seen in (additional)

  • sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governanceIceberg-v3-GA + Iceberg-native-catalog announcement at feature-roundup altitude. Fifteenth Iceberg face on the wiki (after Cloudflare R2 Data Catalog / Iceberg-as-managed-on-vendor-object-store on 2026-05-28). Thirteen-source consolidation point. Three v3 format primitives (deletion vectors, row tracking, VARIANT) scoped across managed / foreign / UniForm-enabled tables. Five catalog-side primitives (managed-Iceberg GA, foreign-Iceberg GA, external-sharing-to-Iceberg-clients GA, cross-engine ABAC Beta, Iceberg-compatible MVs Gated Preview). Forward-looking Iceberg-v4 + Delta-5.0 adaptive-metadata-tree co-evolution. Tier-3 marketing-roundup framing acknowledged in the source page; no quantitative numbers; mechanism depth deferred to spec / docs for each primitive. Architectural significance is the consolidation of named entities (v3 features + four GA / one Beta / one Preview catalog-side primitives) under one catalog-vendor surface area, not deep mechanism disclosure on any single primitive.
Last updated · 542 distilled / 1,571 read