SYSTEM Cited by 1 source
Mercedes-Benz Data Mesh¶
Mercedes-Benz's internal cross-cloud data-sharing backbone, built on the Databricks Data Intelligence Platform (Unity Catalog + Delta Sharing + Delta Lake Deep Clone) to connect an AWS-hosted source of truth with dozens of Azure-side consumer use cases. The headline dataset is after-sales data — vehicle over-the-air events and workshop visits — ~60 TB and growing, used for R&D, warranty case analysis, and marketing.
(Source: sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh)
Problem shape¶
- Multi-cloud + multi-region by design — each workload picks the hyperscaler that best fits. After-sales data sits on AWS (Iceberg on S3, registered in AWS Glue). Most analytics consumers sit on Azure.
- Egress is the economic bottleneck. Live cross-cloud queries against 60 TB were technically fine but cost-prohibitive for latency-tolerant use cases.
- Weekly full loads were the pre-existing mitigation. Cheap per-byte-moved, but 7-day staleness was intolerable for warranty triage.
- Format mismatch. Producer: Iceberg. Consumers: Delta. Producer rewriting everything into Delta first was not the preferred shape.
Architecture¶
AWS (provider) Azure (consumer)
───────────────── ──────────────────
Iceberg on S3 Local Delta on ADLS
│ ▲
│ AWS Glue │
▼ │
Unity Catalog (federation) Unity Catalog (recipient)
│ ▲
│ │
└──────── Delta Sharing (open protocol) ───────────────────┘
│ ▲
│ │
│ Sync Job
│ (Delta Deep Clone,
│ incremental, scheduled)
│
Latency-sensitive consumers:
direct read over Delta Sharing
(freshest, higher per-read egress)
Latency-tolerant consumers:
local reads off ADLS Delta table
(cheap reads, bounded staleness)
Orchestration plane
────────────────────
DDX (Dynamic Data eXchange) ── self-service meta-catalog
Databricks Asset Bundles + Azure DevOps YAML ── Sync-Job deploys
Reporting Job ── bytes → $ → producer chargeback
VACUUM on replicas ── GDPR delete propagation
Design choices worth filing¶
- Delta Sharing as the wire protocol, Unity Catalog as the global catalog — one governance/permission surface, one data-exchange protocol, across clouds and metastores. See concepts/hub-and-spoke-governance.
- Consumer-chosen freshness/cost knob. Both the direct share and the replicated copy are first-class; each data product picks one per use case. This is the mesh's core trade-off surface.
- Replication as a distinct tier, not a protocol change. The freshness/egress trade-off is resolved by placing replicas below the sharing protocol — the producer still exposes exactly one share per data product; the replica is a consumer-side optimisation of that share. See patterns/cross-cloud-replica-cache.
- Cost visibility moved to the sharing tier. Sync Jobs record bytes-transferred; a daily Reporting Job aggregates them per Data Product and bills the producer, not the consumer — patterns/chargeback-cost-attribution.
- Operations deployed like a product. DDX for self-serve permission/workflow management, Databricks Asset Bundles + Azure DevOps for YAML-driven job deployment. The mesh has its own DevOps lifecycle, not ad-hoc admin UI clicks.
- GDPR wired into the replication contract. Delete propagation
isn't a policy document — it's
VACUUMon the replicated Delta tables as part of the sync cycle. A replica that doesn't runVACUUMis a compliance bug.
Reported outcomes (per the case study)¶
| Metric | Value | Scope |
|---|---|---|
| Egress cost reduction | 66 % | First 10 data products, weekly egress |
| Egress cost reduction (projected) | ~93 % | Annual, scaled to 50 use cases |
| Freshness cadence improvement | weekly → every second day | Sync-job window |
| Build time | "a few weeks" | End-to-end first version |
(Self-reported; egress-only, not TCO. Tier-3 source caveats apply.)
Seen in¶
- sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — origin + architecture + numbers.
Related¶
- systems/delta-sharing — exchange protocol.
- systems/delta-lake — replica-tier format + Deep Clone primitive.
- systems/unity-catalog — governance/catalog plane.
- systems/apache-iceberg — source-side format (federated in UC).
- systems/ddx-orchestrator — Mercedes-Benz's internal meta-catalog automating permissions and Sync-Job lifecycle.
- systems/databricks-asset-bundles — YAML-driven deployment unit for the Sync Jobs.
- concepts/data-mesh / concepts/hub-and-spoke-governance / concepts/cross-cloud-architecture / concepts/egress-cost — the conceptual primitives this system exercises.
- patterns/cross-cloud-replica-cache — the headline pattern.
- patterns/chargeback-cost-attribution — the cost-ops pattern.