Skip to content

SYSTEM Cited by 1 source

Zalando Partner Data Sharing Platform

Zalando's Partner Data Sharing Platform is the internal service (built inside Zalando's Partner Tech division, part of the Data Foundation pillar) that delivers analytical data directly to Zalando's commercial partners — thousands of wholesale brands, Partner Program direct-to-consumer sellers, and Connected Retail brick-and-mortar integrations — using the Delta Sharing open protocol, delivered via Databricks' managed service and governed by Unity Catalog. It replaces a prior fragmented landscape of SFTP transfers, CSV downloads, self-service reports, and REST APIs that was costing each partner approximately 1.5 FTE per month in data-extraction and consolidation overhead.

The system is documented as starting as a Partner Tech pilot and evolving into an organisation-wide recipient-management platform after internal teams across Zalando expressed demand for the same primitive — canonical instance of patterns/pilot-to-platform-via-internal-demand.

Scope of data shared

  • 200+ datasets spanning Zalando commerce data across Europe.
  • Up to 200TB per dataset (size range: MB → TB depending on partner and dataset).
  • Covers >€5 billion GMV of commercial partner business on Zalando.
  • Three business models produce the underlying data:
  • Wholesale — Zalando purchases + resells products.
  • Partner Program — partners sell direct to consumers via Zalando.
  • Connected Retail — brick-and-mortar stores integrated into Zalando's platform.

(Source: sources/2025-07-07-zalando-direct-data-sharing-using-delta-sharing-introduction-our-journey-to-empower-partners)

Architectural layers

Data preparation (Unity Catalog)

Datasets live in Zalando's object store as Delta tables, cataloged in systems/unity-catalog. Unity Catalog is the single source of truth for metadata + governance — table schemas, access grants, lineage. Datasets are prepared based on per-partner needs.

Sharing (Delta Sharing protocol)

The exchange layer is systems/delta-sharing — an open protocol Databricks originated (2021) that exposes Delta tables to external clients with zero-copy semantics: partners read live datasets via the Delta Sharing API rather than receiving copies. See concepts/zero-copy-data-sharing-protocol.

Access control (Shares + Recipients)

The deployment primitive is a three-role model:

  1. Delta Share — a logical container grouping related tables.
  2. Recipient — a digital identity representing one partner.
  3. Activation Link — a secure URL that lets the partner download a credential file; the credential file is then used to authenticate Delta Sharing API calls.

Five-step partner onboarding:

  1. Prepare datasets (Delta tables) in Unity Catalog.
  2. Create a Share (logical container).
  3. Add Delta tables to the Share.
  4. Create a Recipient per partner.
  5. Grant permissions on the Share to the Recipient.

Canonicalised as patterns/recipient-per-partner-share-per-dataset-group.

Auth (token-based today, OIDC federation planned)

Current: Activation link → downloaded credential file → token auth. See concepts/activation-link-credential-bootstrap.

Future: Databricks OIDC federation — partners authenticate with their own identity providers, bypassing the intermediate-token step. Framed as "looking into" rather than deployed.

Partner tiers served

Three tiers with orthogonal access-surface needs (concepts/segmented-partner-data-access-tiering):

  • Large partners — want programmatic / automated pipelines. Access via Delta Sharing connector APIs from Spark / pandas / internal data-pipeline stacks.
  • Medium partners — want dashboards + periodic pulls. Access via Delta Sharing clients for Power BI / Tableau.
  • Small partners — want spreadsheets + ad-hoc access. Access via Excel-compatible Delta Sharing client.

Before this platform, each tier was served by a different subsystem (APIs for large, reports for medium, CSV downloads for small). Delta Sharing's broad client ecosystem unifies the access surface — the load-bearing reason an open protocol won the evaluation over proprietary alternatives (patterns/open-protocol-over-proprietary-exchange).

Cross-team dependency graph

Being the first team at Zalando to deploy Delta Sharing forced heavy collaboration with three other platform teams:

  • Data Foundation (central) — Unity Catalog + governance integration, fitting Delta Sharing into Zalando's broader data architecture.
  • AppSec — authentication design + security-vector evaluation (external data access is a new attack surface).
  • IAM — identity / auth-identity standards alignment.

The post frames this as the non-negotiable lesson: every new cross-cutting data-platform primitive needs a joint design pass with governance + security + identity teams up front, before pilot.

Pilot → platform evolution

Pilot phase (Partner Tech only):

  • Manual recipient creation per partner.
  • Manual activation-link distribution.
  • Supports partners who want the data; does not scale to thousands of partners.

Platform phase (org-wide):

  • Internal demand from other Zalando teams validated the primitive was more broadly useful than just Partner Tech.
  • Rather than let each team re-implement, Partner Tech evolved the pilot into a reusable recipient-management platform for all of Zalando.
  • Every manual pilot step becomes a platform feature.
  • Platform ships with comprehensive user guides (Pandas + Apache Spark step-by-step) so partners can go from activation-link to first dataset pull in minutes.
  • Broader governance framework needed (different teams have different compliance needs; platform must support multi-tenant governance).

Why Databricks managed over self-hosted Delta Sharing

Delta Sharing is open source; Zalando could have self-hosted the reference implementation. They chose the managed service instead. Reasons surfaced in the post:

  • Operational excellence over technical purity. Managed service = no infrastructure component to operate; team focuses on partner value, not server maintenance.
  • Unity Catalog integration out of the box — governance plane for free.
  • Built-in audit logging — required for a production system serving critical partner relationships.
  • Built-in security primitives — credential files, token management, access control all delivered by the platform.

Load-bearing quote: "operational excellence often trumps technical purity" — generalised as patterns/managed-services-over-custom-ml-platform (the ML label is historical; pattern applies to any managed-vs-self-hosted decision in production infra).

Seen in

Last updated · 550 distilled / 1,221 read