Skip to content

CONCEPT Cited by 4 sources

Open Table Format

An open table format (OTF) is a metadata layer over columnar data files on object storage that adds table semantics — atomic row-level updates, schema evolution, time-travel / snapshot versioning — to a data model that is fundamentally whole-object and immutable. Canonical implementations: Apache Iceberg, Delta Lake, Apache Hudi.

(Source: sources/2025-03-14-allthingsdistributed-s3-simplicity-is-table-stakes)

The gap OTFs fill

concepts/immutable-object-storage offers strong primitives (replication, durability, versioning, simple semantics) but stops at the object boundary. Analytical workloads want table-level primitives — mutate a row, add a column, query "as of yesterday". Implementing these by rewriting entire tables on every change is prohibitive.

The OTF pattern decouples:

  • Data files (typically systems/apache-parquet) — immutable, columnar, written once.
  • Metadata layer — a snapshot manifest describing "which set of data files constitutes the table at version N". Mutations produce a new manifest that references mostly the same files plus deltas.

Properties this enables

  • Row-level insert/update/delete — expressed as a new snapshot that adds delta files, not by rewriting bulk Parquet.
  • Schema evolution — the manifest carries the logical schema; data files carry physical column layouts; readers resolve.
  • Time travel / branching — snapshots are addressable by id or timestamp; "read the table as of point X" is just following an older manifest pointer.
  • Atomic multi-file commits — the commit is a single metadata update, even though it references many data files.

The externalisation cost

Because the metadata + compaction + GC loop runs in customer code, several operational burdens sit outside the platform:

  • Compaction — snapshot-based small updates fragment the table; periodic compaction passes are needed to keep scan performance up.
  • Garbage collection — superseded snapshots and their unreferenced files have to be reclaimed by a customer-owned job.
  • Storage-feature mismatch — object-level features (S3 Intelligent-Tiering, cross-region replication) don't know the logical table; they can tier or replicate inconsistently.
  • Access control — IAM / ACLs are typically object-scoped; the logical table isn't a policy resource.

Warfield's 2025 framing: customers "were really… building their own table primitive over S3 objects." systems/s3-tables is S3 absorbing those responsibilities so the table becomes a first-class storage resource.

Trade-off axis

OTFs let customers keep their data in an open format (so any engine can read it), at the cost of running the table-management loop themselves. Managed offerings (S3 Tables, Databricks Unity Catalog, Snowflake-managed Iceberg) reduce that cost but reintroduce a form of platform coupling — typically at the catalog and compaction policy level.

Seen in

Last updated · 200 distilled / 1,178 read