Skip to content

SYSTEM Cited by 3 sources

Delta Lake

Delta Lake is an open-source concepts/open-table-format built over systems/apache-parquet on object storage. It is one of the three canonical OTFs alongside systems/apache-iceberg and Apache Hudi, and the table format native to Databricks' Data Intelligence Platform.

Minimum viable framing for this wiki: it plays the same architectural role as Iceberg — ACID transactions, schema evolution, time-travel, snapshot-based metadata over immutable columnar files. See concepts/open-table-format for the shared shape and the gap the format class fills above concepts/immutable-object-storage.

Features cited in ingested sources

  • Deep Clone. Incrementally materialises a snapshot of a Delta table (typically backed by another Delta table, or a Delta-Sharing share) as a new, physically separate Delta table in the clone's object store. Subsequent deep-clone runs transfer only the delta since the previous clone. This is the replication primitive Mercedes-Benz's cross-cloud Sync Jobs use — it's the thing that makes patterns/cross-cloud-replica-cache economically viable at 60 TB.
  • VACUUM. Cleans up files referenced by old snapshots once retention policy has passed. Mercedes-Benz leans on VACUUM on the replicated Delta tables to enforce GDPR right-to-be-forgotten: deletions on the source propagate through the next Deep Clone sync; old files are then vacuumed out of ADLS on the recipient side.

(Source: sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh)

Seen in

Last updated · 200 distilled / 1,178 read