Skip to content

SYSTEM Cited by 2 sources

Delta Sharing

Delta Sharing is an open protocol (originated at Databricks, 2021) for secure data exchange between parties without copying the data first. It is the exchange layer Databricks positions under systems/unity-catalog: UC is the catalog / governance plane, Delta Sharing is the bytes-on-the-wire plane between UC metastores or with any conforming external client.

Role in the Mercedes-Benz case study

Mercedes-Benz uses Delta Sharing in three deployment shapes, all on one protocol:

  1. Cross-cloud / cross-hyperscaler — AWS (provider metastore) ↔ Azure (recipient metastore). The headline case; bridges after-sales data from AWS to dozens of Azure consumers.
  2. Cross-region / cross-metastore inside one cloud — when regions or metastores inside one cloud need to exchange data products.
  3. External sharing with partners / suppliers — framed as a more secure alternative to FTP drops or shared-secret exchange; recipient may or may not be on Databricks themselves.

Same wire protocol, three trust boundaries. (Source: sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh)

Properties that made it fit

  • Open. Open-source protocol + multiple client implementations (Databricks-native, pandas, Spark, Python, etc.); avoids a pure proprietary lock-in argument when talking to external partners.
  • Incremental updates. A share isn't a snapshot copy; recipients can pull deltas as new data lands on the provider side. This is the property that makes the patterns/cross-cloud-replica-cache viable — without incremental semantics, a sync window would be "re-download 60 TB".
  • UC-federated sources. Mercedes-Benz's tables were Iceberg on AWS Glue; UC federates them so they can be shared as Delta Sharing tables without the producer re-writing into Delta first. The format translation happens at the sharing boundary.

Pairing with Deep Clone for egress control

Direct Delta-Sharing reads across clouds are a live cross-cloud query; each read pays egress. For bulk, latency-tolerant consumers, the pattern Mercedes-Benz implemented is:

Provider (AWS) ─ Delta Share ─► Sync Job (Delta Deep Clone, incremental) ─► Local Delta replica (ADLS, Azure)
                                                                             └► Local consumers (no cross-cloud hop per query)

Each sync pays egress once per window; all subsequent reads are local. See patterns/cross-cloud-replica-cache for the general shape.

Bi-format recipient surface (2026-05-28)

Delta Sharing was historically Delta-native on the recipient side — recipients consumed the protocol via Delta-compatible clients (Spark, pandas, the open-source Delta Sharing client libraries). The 2026-05-28 announcement extends the recipient surface to Iceberg clients:

"Databricks customers can share live data externally with any recipient that supports the Iceberg REST Catalog API. Recipients can query shared data from Iceberg-compatible clients such as Snowflake, Trino, Flink, and Spark, without manual ingestion or copies."

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

Two state changes ship together:

  • External Sharing to Iceberg clients (GA) — the protocol now emits Iceberg REST Catalog endpoints alongside the existing Delta Sharing endpoints. Recipients that don't run Databricks but do run any Iceberg-compatible engine (Snowflake / Trino / Flink / Spark / DuckDB) can consume shared data live, without manual ingestion or copies. The provider continues to manage access, audit, and governance through Unity Catalog.
  • External Sharing of Foreign Iceberg tables (Public Preview) — providers can include Iceberg tables that are managed outside Databricks but registered in UC into a Delta Sharing share. UC becomes the sharing layer for both managed and foreign Iceberg tables, "while keeping data in place and governance centralized." The wire protocol bridges the foreign-Iceberg → Delta-Sharing-share → Iceberg-REST-recipient path without rewriting data.

The combined effect: Delta Sharing is now bi-format on the recipient side (Delta-compatible or Iceberg-compatible clients consume the same shares), and bi-format on the provider side (managed Delta, managed Iceberg, or foreign Iceberg can be the source of a share). UC remains the single governance plane across all four combinations.

Seen in

  • sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governanceBi-format recipient + foreign-Iceberg sharing announcement. Two state changes ship together: External Sharing to Iceberg clients reaches GA (Snowflake / Trino / Flink / Spark consume Delta Sharing shares via the Iceberg REST Catalog API); External Sharing of Foreign Iceberg tables in Public Preview (providers share Iceberg tables managed outside Databricks but registered in UC). The architectural property: protocol becomes bi-format on both sides — Delta-compatible or Iceberg-compatible recipients, managed-Delta or managed-Iceberg or foreign-Iceberg sources. UC remains the single governance plane. Generalises the "three deployment shapes, one wire protocol" framing from the Mercedes-Benz case study to "three source shapes × two recipient shapes, one wire protocol". Tier-3 marketing-roundup framing acknowledged in the source page; mechanism depth (wire-protocol changes for Iceberg-REST emission, recipient-discovery for bi-format clients) deferred to docs.
  • Zalando Partner Tech (Data Foundation pillar) deploys Delta Sharing via Databricks' managed service to share 200+ datasets / up to 200TB with thousands of commercial partners across three business models (wholesale / Partner Program / Connected Retail), replacing a fragmented SFTP/CSV/API legacy path that cost partners ~1.5 FTE/month in extraction overhead. Load-bearing architectural framings surfaced: (a) open protocol chosen explicitly for partner-client-ecosystem breadth — Spark
  • pandas + Power BI + Excel clients unify access across a three-tier partner segmentation (large / medium / small) (concepts/segmented-partner-data-access-tiering + patterns/open-protocol-over-proprietary-exchange); (b) managed over self-hosted — Databricks' managed service chosen explicitly with load-bearing quote "operational excellence often trumps technical purity"; (c) three-primitive deployment model — Share (logical container) + Recipient (per-partner digital identity) + Activation Link (one-time-use credential bootstrap URL), canonicalised as patterns/recipient-per-partner-share-per-dataset-group; (d) zero-copy as a first-class property — partners read live Delta tables without copy/sync tax (concepts/zero-copy-data-sharing-protocol); (e) token-based auth via activation links today, with OIDC federation named as next-step; (f) cross-team dependency graph (Data Foundation + AppSec + IAM) up front, framed as non-negotiable for any new external-data- access surface; (g) pilot-to-platform via internal-demand signal — Partner Tech pilot grew into an org-wide recipient- management platform after other Zalando teams reached out (patterns/pilot-to-platform-via-internal-demand). First wiki ingest of Delta Sharing at B2B partner-data altitude (prior coverage was cross-cloud data mesh). Deployment shape: systems/zalando-partner-data-sharing-platform.
  • sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — primary case study; cross-cloud AWS→Azure data mesh, three deployment shapes, pairing with systems/delta-lake Deep Clone for egress-bounded consumers.
Last updated · 542 distilled / 1,571 read