Skip to content

SYSTEM Cited by 1 source

Mercedes-Benz Data Mesh

Mercedes-Benz's internal cross-cloud data-sharing backbone, built on the Databricks Data Intelligence Platform (Unity Catalog + Delta Sharing + Delta Lake Deep Clone) to connect an AWS-hosted source of truth with dozens of Azure-side consumer use cases. The headline dataset is after-sales data — vehicle over-the-air events and workshop visits — ~60 TB and growing, used for R&D, warranty case analysis, and marketing.

(Source: sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh)

Problem shape

  • Multi-cloud + multi-region by design — each workload picks the hyperscaler that best fits. After-sales data sits on AWS (Iceberg on S3, registered in AWS Glue). Most analytics consumers sit on Azure.
  • Egress is the economic bottleneck. Live cross-cloud queries against 60 TB were technically fine but cost-prohibitive for latency-tolerant use cases.
  • Weekly full loads were the pre-existing mitigation. Cheap per-byte-moved, but 7-day staleness was intolerable for warranty triage.
  • Format mismatch. Producer: Iceberg. Consumers: Delta. Producer rewriting everything into Delta first was not the preferred shape.

Architecture

AWS (provider)                                          Azure (consumer)
─────────────────                                       ──────────────────
Iceberg on S3                                           Local Delta on ADLS
    │                                                          ▲
    │  AWS Glue                                                 │
    ▼                                                           │
Unity Catalog (federation)                          Unity Catalog (recipient)
    │                                                           ▲
    │                                                           │
    └──────── Delta Sharing (open protocol) ───────────────────┘
                   │                                            ▲
                   │                                            │
                   │                                      Sync Job
                   │                                (Delta Deep Clone,
                   │                                 incremental, scheduled)
           Latency-sensitive consumers:
           direct read over Delta Sharing
           (freshest, higher per-read egress)

           Latency-tolerant consumers:
           local reads off ADLS Delta table
           (cheap reads, bounded staleness)

Orchestration plane
────────────────────
  DDX (Dynamic Data eXchange) ── self-service meta-catalog
  Databricks Asset Bundles + Azure DevOps YAML ── Sync-Job deploys
  Reporting Job ── bytes → $ → producer chargeback
  VACUUM on replicas ── GDPR delete propagation

Design choices worth filing

  • Delta Sharing as the wire protocol, Unity Catalog as the global catalog — one governance/permission surface, one data-exchange protocol, across clouds and metastores. See concepts/hub-and-spoke-governance.
  • Consumer-chosen freshness/cost knob. Both the direct share and the replicated copy are first-class; each data product picks one per use case. This is the mesh's core trade-off surface.
  • Replication as a distinct tier, not a protocol change. The freshness/egress trade-off is resolved by placing replicas below the sharing protocol — the producer still exposes exactly one share per data product; the replica is a consumer-side optimisation of that share. See patterns/cross-cloud-replica-cache.
  • Cost visibility moved to the sharing tier. Sync Jobs record bytes-transferred; a daily Reporting Job aggregates them per Data Product and bills the producer, not the consumer — patterns/chargeback-cost-attribution.
  • Operations deployed like a product. DDX for self-serve permission/workflow management, Databricks Asset Bundles + Azure DevOps for YAML-driven job deployment. The mesh has its own DevOps lifecycle, not ad-hoc admin UI clicks.
  • GDPR wired into the replication contract. Delete propagation isn't a policy document — it's VACUUM on the replicated Delta tables as part of the sync cycle. A replica that doesn't run VACUUM is a compliance bug.

Reported outcomes (per the case study)

Metric Value Scope
Egress cost reduction 66 % First 10 data products, weekly egress
Egress cost reduction (projected) ~93 % Annual, scaled to 50 use cases
Freshness cadence improvement weekly → every second day Sync-job window
Build time "a few weeks" End-to-end first version

(Self-reported; egress-only, not TCO. Tier-3 source caveats apply.)

Seen in

Last updated · 200 distilled / 1,178 read