Mercedes-Benz builds a cross-cloud data mesh with Delta Sharing and intelligent replication¶
Case study from Mercedes-Benz on building a cross-hyperscaler data-sharing backbone between AWS (producer) and Azure (consumer) using systems/delta-sharing + systems/unity-catalog, and cutting recurrent inter-cloud egress costs by routing bulk consumers through periodic Delta Deep-Clone replicas instead of live cross-cloud queries.
One-paragraph summary¶
Mercedes-Benz's after-sales dataset (~60 TB, growing) lives on AWS in Iceberg format while most analytical consumers sit on Azure. Direct cross-cloud queries bled egress dollars; weekly full-load copies were cheap but froze data at 7 days old — too stale for warranty triage. They built a data mesh on Databricks Unity Catalog with Delta Sharing as the open cross-cloud exchange protocol, and added an intelligent-replication tier underneath: for latency-tolerant consumers, automated Sync Jobs use Delta Deep Clone to incrementally mirror the shared tables into the consumer cloud's object store (ADLS), so reads stay local. Cost tracking is plumbed into the Sync Jobs themselves so bytes transferred are billed back to the upstream data producer. Reported outcomes: 66 % egress reduction on the first 10 data products, ~93 % projected annual egress reduction scaled to 50 use cases, and freshness cadence improved from weekly to every-second-day.
Key takeaways¶
-
Delta Sharing as the protocol, Unity Catalog as the global catalog. UC is the hub-and-spoke governance layer — one central metadata + access-control surface across metastores, regions, and clouds. Delta Sharing is the open protocol UC speaks for the actual data exchange, between UC metastores (cross-region, cross-hyperscaler) and with external partners. UC federates the AWS Glue-registered Iceberg tables so they can be shared without moving them. (Source: article §"Unity Catalog and Delta Sharing")
-
Three Delta-Sharing deployment shapes on one protocol. (a) cross-cloud/cross-hyperscaler (AWS ↔ Azure, the headline case), (b) cross-region/cross-metastore inside one cloud, (c) external sharing with suppliers ("more secure than FTP or shared secrets"). Same technology, three trust boundaries. (§"Delta Sharing used in three configurations")
-
Egress is the cost axis, not latency. Live cross-cloud Delta Sharing reads work — they're preferred where freshness matters — but for bulk, low-freshness-tolerant consumers on Azure, each query against AWS-hosted shares pays cross-cloud egress. The hybrid answer is not "stop sharing" but "replicate once per sync window, serve many queries locally". (§"Hybrid Approach")
-
Delta Deep Clone as the replication primitive. Periodic automated Sync Jobs use Deep Clone to materialise the shared table as a local Delta table in the recipient's object store (ADLS on Azure, S3 on AWS). Deep Clone is incremental — the second sync only writes the delta, not the full 60 TB. Pattern generalises as patterns/cross-cloud-replica-cache: remote canonical store, local incrementally-rebuilt replica serving cheap local reads. (§"Hybrid Approach → Periodic Sync Job")
-
Consumer-chosen freshness/cost knob. Each data product gets to pick its point on the curve: (i) direct Delta Share over the wire = freshest, highest per-read egress, (ii) replicated local copy with N-hour sync cadence = cheap reads, bounded staleness. The mesh exposes both; consumers commit to one per use case. (§"Hybrid Approach" summary)
-
Cost tracking wired into the sharing tier → producer chargeback. Sync Jobs record exact bytes transferred; a daily Reporting Job aggregates that into per-Data-Product egress cost that is billed back to the upstream data producer, not the consumer. This flips the usual "who pays for bulk replication" question and makes producers internalise the cost of inefficiently-modelled data. (§"Cost Tracking and Attribution" → patterns/chargeback-cost-attribution)
-
DDX (Dynamic Data eXchange) as the self-service control surface. A meta-catalog that automates permission grants (via microservices
-
Databricks APIs), Sync-Job lifecycle, and the sharing/replication workflows — so product teams don't operate shares by hand. Deployed via Databricks Asset Bundles + YAML in Azure DevOps (full DevOps-style release pipeline for the mesh itself). (§"Technical Implementation")
-
GDPR via Delta Lake
VACUUMon replicas. Right-to-be-forgotten on the producer side has to propagate to every replica. The replicated Delta tables runVACUUMas part of the sync contract, so deletes on the source show up in downstream ADLS stores within the sync window. Without this, intelligent replication would quietly become a compliance violation. (§"GDPR and Governance") -
Format-mismatch resolution at the sharing boundary. Source tables were Iceberg (what AWS Glue held); consumers expected Delta. UC + Delta Sharing did the translation at federation time, so the producer didn't have to rewrite into Delta first. This is one of the practical reasons UC's federation matters — otherwise the mesh would force a ground-truth format choice on every producer. (§"Unity Catalog and Delta Sharing")
-
Results the architecture was optimising for (reported by the authors): 66 % egress cost drop on first 10 data products, ~93 % annual egress reduction projected across 50 use cases, freshness cadence weekly → every-second-day, serverless Databricks Jobs running the sync tier "more or less without any problem and without any intervention". Operational stability is claimed; quantitative baselines for "without any intervention" aren't shared. (§"Quantitative Benefits and ROI")
Numbers¶
| Quantity | Value | Notes |
|---|---|---|
| After-sales dataset size | ~60 TB | Subset serving dozens of Azure use cases |
| Egress cost reduction (first 10 data products) | 66 % | Weekly egress ≈ two-thirds cheaper |
| Projected annual egress reduction (50 use cases) | ~93 % | Same calculation method, scaled |
| Freshness cadence | weekly → every second day | Bounded by sync cadence, not protocol |
| End-to-end time to ship the solution | "a few weeks" | First version, by the Mercedes-Benz team |
Architectural shape (ASCII)¶
AWS (producer side)
┌──────────────────────────────────────────────────────────────┐
│ Iceberg tables ◄──── AWS Glue catalog ◄──── [Unity Catalog │
│ (after-sales, ~60 TB, growing) federation] │
│ │ │
│ Delta Sharing │
│ (open │
│ protocol) │
└───────────────────────────┬──────────────────────────┬───────┘
│ │
direct live read Sync Job
(freshness, (Delta Deep Clone,
costlier per read) incremental, scheduled)
│ │
┌───────────────────────────▼──────────────────────────▼───────┐
│ Azure (consumer side) │
│ │
│ Latency-sensitive Latency-tolerant │
│ consumers ◄── Delta Sharing consumers ◄── local │
│ Delta table │
│ in ADLS │
│ │
│ Governance: [Unity Catalog (recipient metastore)] │
└──────────────────────────────────────────────────────────────┘
DDX Orchestrator: permissions / Sync Jobs / workflows
DABs + Azure DevOps YAML: deploys the whole thing
Reporting Job: bytes → $ → producer chargeback
VACUUM: enforces GDPR delete propagation on replicas
Caveats / what the post doesn't say¶
- No numbers for the direct-share fast path. We're told latency-sensitive workloads still go direct; no TP/TS numbers on them, so "cost vs freshness" trade-off is qualitative.
- Sync cadence is "every second day" by example, not policy. No description of how cadence is chosen per data product, or what the machinery looks like when two consumers want different cadences on the same share.
- Compute cost of the Sync Jobs themselves is mentioned as "tracked" by the cost dashboard, but not quantified against the egress savings. The 66 % / 93 % figures are egress reductions, not total-cost-of-ownership reductions.
- "Serverless Databricks Jobs … without any intervention" is self-reported operational stability, no incident data or SLO figures.
- Single-producer, single-direction framing. The post is mostly about after-sales AWS → Azure. Multi-producer mesh dynamics (e.g. consumer-of-A-that's-also-producer-to-B, replication graph cycles) aren't discussed.
- Iceberg → Delta federation cost is described as "UC does it" but no detail on write amplification, metadata compatibility caveats, or schema-evolution behaviour across the boundary.
- Vendor blog, vendor product. Cost numbers come from Mercedes-Benz internal calcs published by Databricks marketing; no independent validation. Tier-3 source caveats apply.