Skip to content

ZALANDO 2025-07-07 Tier 2

Read original ↗

Zalando — Direct Data Sharing using Delta Sharing: Introduction — Our Journey to Empower Partners at Zalando

Summary

Zalando's Partner Tech division (within the Data Foundation pillar) shares analytical data with thousands of commercial partners across three business models — wholesale, Partner Program (direct-to-consumer), and Connected Retail (brick-and-mortar integration). Pre-existing access paths were a fragmented landscape of SFTP transfers, CSV downloads, self-service reports, and APIs — forcing partners to allocate ~1.5 FTE per month just on data extraction and consolidation. After partner-interview-driven discovery and a systematic solution evaluation, the team chose Delta Sharing — an open data-sharing protocol originated at Databricks (2021) — as their unified exchange substrate, and specifically chose Databricks' managed Delta Sharing service with Unity Catalog integration over self-hosting the open-source protocol. This post is the first in a series and establishes the problem framing, solution criteria, tool selection, pilot-to-platform evolution, and lessons learned — explicitly deferring the deep technical architecture to later posts.

Key takeaways

  • Scale framing that justifies the investment. Partner Tech manages 200+ datasets with sizes up to 200TB, across a platform steering >€5 billion GMV of commercial partner business. Partners range from small retailers with a few hundred SKUs to major brands with tens of thousands of products; data volumes span megabytes to terabytes.

  • Partner segmentation drives a non-uniform access surface. The team frames three partner tiers with orthogonal needs (concepts/segmented-partner-data-access-tiering): large partners want programmatic pipelines, medium partners want dashboards + periodic pulls, small partners want spreadsheets and ad-hoc access. Pre-Delta-Sharing, each tier was served by a different subsystem; the whole point of picking Delta Sharing was that its client ecosystem covers all three (Spark / pandas / Power BI / Excel connectors).

  • Fragmented-data cost canonicalised at the partner side. Partners were allocating 1.5 FTE per month per partner solely to data extraction and consolidation across SFTP / CSV / reports / APIs. This is the load-bearing business number for the whole programme — the architecture's ROI case is "this FTE is strategic talent wasted on data-plumbing".

  • Open protocol over proprietary was a deliberate architectural axis. Evaluation criteria explicitly required cloud-agnostic

  • compatible with open tools + extensible — picking an open protocol means partners can integrate with their existing analytics stacks (Spark / pandas / Tableau / Power BI / Excel) rather than being forced onto Zalando-proprietary clients (patterns/open-protocol-over-proprietary-exchange).

  • Managed Delta Sharing over self-hosted was a deliberate architectural axis. The team explicitly chose Databricks' managed Delta Sharing service over self-hosting the open-source protocol because the managed service provided Unity Catalog governance, token management, audit logging, and built-in security — "operational excellence we needed for a production system serving critical partner relationships". Canonical instance of patterns/managed-services-over-custom-ml-platform generalised beyond ML to partner-data infra. The load-bearing disclosure: "operational excellence often trumps technical purity".

  • Zero-copy access framed as a first-class architectural property. Partners get direct access to live datasets without the overhead of constant data transfers — the data stays in Zalando's object store; the protocol is a read channel, not a copy channel. This avoids the storage duplication + ongoing-synchronisation tax of the legacy SFTP / CSV path. Canonicalised as concepts/zero-copy-data-sharing-protocol.

  • Token-based auth with activation-link-delivered credential files. For the initial phases, partners are provisioned via Databricks recipient tokens — a Recipient (digital identity for the partner) is created, an Activation Link (secure URL) is delivered, the partner downloads a credential file from it, and uses the file to authenticate Delta Sharing API calls. Future phase: Databricks OIDC federation so partners can authenticate with their own identity providers and skip the intermediate-token step. Canonicalised as concepts/activation-link-credential-bootstrap.

  • Three-primitive mental model: Share, Recipient, Activation Link. A Delta Share is a logical container grouping related tables for distribution; a Recipient is a digital identity representing a partner; an Activation Link is the secure URL a partner uses to obtain their credential file. The five deployment steps are: (1) prepare datasets in Unity Catalog → (2) create a Share (logical container) → (3) add Delta tables to the Share → (4) create a Recipient per partner → (5) grant permissions from the Share to the Recipient. Canonicalised as patterns/recipient-per-partner-share-per-dataset-group.

  • Cross-team dependency graph up-front. Success required heavy collaboration with three other Zalando platform teams: (a) central Data Foundation team for Unity Catalog / governance integration; (b) AppSec team for authentication + security-vector evaluation; (c) IAM team for identity / auth-identity standards. The post frames this explicitly as the non-negotiable lesson — external-data-access path is a cross-cutting design problem. First Zalando team to deploy Delta Sharing; every cross-team touchpoint was a first-time design question.

  • Pilot-to-platform pattern via internal inbound demand. Once the Partner Tech pilot existed, internal teams across Zalando started asking to use the same primitive for their own data-sharing problems. Rather than let each team re-implement, Partner Tech evolved its pilot into a reusable recipient-management platform for the whole company. Canonicalised as patterns/pilot-to-platform-via-internal-demand with the load-bearing framing "internal demand validates external value".

  • Manual steps acceptable for pilot, platform requires automation. Pilot phase used manual recipient creation + manual activation-link distribution. Scaling up required turning every manual step into a platform primitive: "every manual step in our pilot became a feature requirement for our platform".

Systems / concepts / patterns

Systems: systems/delta-sharing (primary), systems/databricks (managed service vendor), systems/unity-catalog (governance plane), systems/delta-lake (underlying table format), systems/zalando-partner-data-sharing-platform (the emerging Zalando-internal platform).

Concepts: concepts/zero-copy-data-sharing-protocol (live-data access without copy/sync tax), concepts/activation-link-credential-bootstrap (secure-URL + downloaded credential file as the partner onboarding channel), concepts/segmented-partner-data-access-tiering (large/medium/small partner tiers with orthogonal access-surface needs).

Patterns: patterns/open-protocol-over-proprietary-exchange (open standard + open client ecosystem as partner-friendly axis), patterns/recipient-per-partner-share-per-dataset-group (5-step Share + Recipient + Activation Link deployment primitive), patterns/pilot-to-platform-via-internal-demand (team-scoped pilot evolves into org-wide platform on validated inbound interest), patterns/managed-services-over-custom-ml-platform (generalised beyond ML: managed > custom when operational excellence matters more than technical purity).

Operational numbers

  • 200+ datasets under Partner Tech management.
  • Up to 200TB per dataset (top end; sizes vary from MB to TB across partner mix).
  • >€5 billion GMV on Zalando's commercial partner platform.
  • 1.5 FTE per month per partner spent on pre-Delta-Sharing data extraction + consolidation.
  • Thousands of active partners across the three business models (wholesale / Partner Program / Connected Retail).
  • Pilot team size / Delta-Sharing deployment numbers: not disclosed in this post (deferred to later posts in the series).

Caveats

  • Introduction post / Part 1 of a series. Explicitly defers: (a) Delta Sharing technical-architecture internals, (b) Databricks / Unity Catalog capability deep-dive, (c) the implementation details of Zalando's platform evolution. Extraction here is business-and-rationale-altitude; technical internals will land in later ingests from the same series.

  • No performance numbers: no partner-count-deployed-to-date, no data-pull volume, no end-to-end latency, no cost comparison vs SFTP/CSV path. The 1.5 FTE/month number is the only hard disclosure and it is pre-state (cost of not having Delta Sharing), not post-state measurement.

  • OIDC federation is future work. Current production path is token-based auth via activation links + credential files. OIDC federation is framed as "looking into", not deployed.

  • Pilot phase scope unclear. How many partners are live on the Delta Sharing path today vs still on legacy SFTP/CSV/API is not disclosed. The post is framed as programme-design-rationale, not a rollout-status update.

  • Tier-2 Zalando + on-scope. This is a clear architecture narrative with real scale numbers + deliberate build-vs-buy analysis + cross-team collaboration disclosure + named lessons learned. Passes scope-filter decisively even though it's an introduction post.

Source

Last updated · 550 distilled / 1,221 read