Zalando — Direct Data Sharing using Delta Sharing: Introduction — Our Journey to Empower Partners at Zalando¶
Summary¶
Zalando's Partner Tech division (within the Data Foundation pillar) shares analytical data with thousands of commercial partners across three business models — wholesale, Partner Program (direct-to-consumer), and Connected Retail (brick-and-mortar integration). Pre-existing access paths were a fragmented landscape of SFTP transfers, CSV downloads, self-service reports, and APIs — forcing partners to allocate ~1.5 FTE per month just on data extraction and consolidation. After partner-interview-driven discovery and a systematic solution evaluation, the team chose Delta Sharing — an open data-sharing protocol originated at Databricks (2021) — as their unified exchange substrate, and specifically chose Databricks' managed Delta Sharing service with Unity Catalog integration over self-hosting the open-source protocol. This post is the first in a series and establishes the problem framing, solution criteria, tool selection, pilot-to-platform evolution, and lessons learned — explicitly deferring the deep technical architecture to later posts.
Key takeaways¶
-
Scale framing that justifies the investment. Partner Tech manages 200+ datasets with sizes up to 200TB, across a platform steering >€5 billion GMV of commercial partner business. Partners range from small retailers with a few hundred SKUs to major brands with tens of thousands of products; data volumes span megabytes to terabytes.
-
Partner segmentation drives a non-uniform access surface. The team frames three partner tiers with orthogonal needs (concepts/segmented-partner-data-access-tiering): large partners want programmatic pipelines, medium partners want dashboards + periodic pulls, small partners want spreadsheets and ad-hoc access. Pre-Delta-Sharing, each tier was served by a different subsystem; the whole point of picking Delta Sharing was that its client ecosystem covers all three (Spark / pandas / Power BI / Excel connectors).
-
Fragmented-data cost canonicalised at the partner side. Partners were allocating 1.5 FTE per month per partner solely to data extraction and consolidation across SFTP / CSV / reports / APIs. This is the load-bearing business number for the whole programme — the architecture's ROI case is "this FTE is strategic talent wasted on data-plumbing".
-
Open protocol over proprietary was a deliberate architectural axis. Evaluation criteria explicitly required cloud-agnostic
-
compatible with open tools + extensible — picking an open protocol means partners can integrate with their existing analytics stacks (Spark / pandas / Tableau / Power BI / Excel) rather than being forced onto Zalando-proprietary clients (patterns/open-protocol-over-proprietary-exchange).
-
Managed Delta Sharing over self-hosted was a deliberate architectural axis. The team explicitly chose Databricks' managed Delta Sharing service over self-hosting the open-source protocol because the managed service provided Unity Catalog governance, token management, audit logging, and built-in security — "operational excellence we needed for a production system serving critical partner relationships". Canonical instance of patterns/managed-services-over-custom-ml-platform generalised beyond ML to partner-data infra. The load-bearing disclosure: "operational excellence often trumps technical purity".
-
Zero-copy access framed as a first-class architectural property. Partners get direct access to live datasets without the overhead of constant data transfers — the data stays in Zalando's object store; the protocol is a read channel, not a copy channel. This avoids the storage duplication + ongoing-synchronisation tax of the legacy SFTP / CSV path. Canonicalised as concepts/zero-copy-data-sharing-protocol.
-
Token-based auth with activation-link-delivered credential files. For the initial phases, partners are provisioned via Databricks recipient tokens — a Recipient (digital identity for the partner) is created, an Activation Link (secure URL) is delivered, the partner downloads a credential file from it, and uses the file to authenticate Delta Sharing API calls. Future phase: Databricks OIDC federation so partners can authenticate with their own identity providers and skip the intermediate-token step. Canonicalised as concepts/activation-link-credential-bootstrap.
-
Three-primitive mental model: Share, Recipient, Activation Link. A Delta Share is a logical container grouping related tables for distribution; a Recipient is a digital identity representing a partner; an Activation Link is the secure URL a partner uses to obtain their credential file. The five deployment steps are: (1) prepare datasets in Unity Catalog → (2) create a Share (logical container) → (3) add Delta tables to the Share → (4) create a Recipient per partner → (5) grant permissions from the Share to the Recipient. Canonicalised as patterns/recipient-per-partner-share-per-dataset-group.
-
Cross-team dependency graph up-front. Success required heavy collaboration with three other Zalando platform teams: (a) central Data Foundation team for Unity Catalog / governance integration; (b) AppSec team for authentication + security-vector evaluation; (c) IAM team for identity / auth-identity standards. The post frames this explicitly as the non-negotiable lesson — external-data-access path is a cross-cutting design problem. First Zalando team to deploy Delta Sharing; every cross-team touchpoint was a first-time design question.
-
Pilot-to-platform pattern via internal inbound demand. Once the Partner Tech pilot existed, internal teams across Zalando started asking to use the same primitive for their own data-sharing problems. Rather than let each team re-implement, Partner Tech evolved its pilot into a reusable recipient-management platform for the whole company. Canonicalised as patterns/pilot-to-platform-via-internal-demand with the load-bearing framing "internal demand validates external value".
-
Manual steps acceptable for pilot, platform requires automation. Pilot phase used manual recipient creation + manual activation-link distribution. Scaling up required turning every manual step into a platform primitive: "every manual step in our pilot became a feature requirement for our platform".
Systems / concepts / patterns¶
Systems: systems/delta-sharing (primary), systems/databricks (managed service vendor), systems/unity-catalog (governance plane), systems/delta-lake (underlying table format), systems/zalando-partner-data-sharing-platform (the emerging Zalando-internal platform).
Concepts: concepts/zero-copy-data-sharing-protocol (live-data access without copy/sync tax), concepts/activation-link-credential-bootstrap (secure-URL + downloaded credential file as the partner onboarding channel), concepts/segmented-partner-data-access-tiering (large/medium/small partner tiers with orthogonal access-surface needs).
Patterns: patterns/open-protocol-over-proprietary-exchange (open standard + open client ecosystem as partner-friendly axis), patterns/recipient-per-partner-share-per-dataset-group (5-step Share + Recipient + Activation Link deployment primitive), patterns/pilot-to-platform-via-internal-demand (team-scoped pilot evolves into org-wide platform on validated inbound interest), patterns/managed-services-over-custom-ml-platform (generalised beyond ML: managed > custom when operational excellence matters more than technical purity).
Operational numbers¶
- 200+ datasets under Partner Tech management.
- Up to 200TB per dataset (top end; sizes vary from MB to TB across partner mix).
- >€5 billion GMV on Zalando's commercial partner platform.
- 1.5 FTE per month per partner spent on pre-Delta-Sharing data extraction + consolidation.
- Thousands of active partners across the three business models (wholesale / Partner Program / Connected Retail).
- Pilot team size / Delta-Sharing deployment numbers: not disclosed in this post (deferred to later posts in the series).
Caveats¶
-
Introduction post / Part 1 of a series. Explicitly defers: (a) Delta Sharing technical-architecture internals, (b) Databricks / Unity Catalog capability deep-dive, (c) the implementation details of Zalando's platform evolution. Extraction here is business-and-rationale-altitude; technical internals will land in later ingests from the same series.
-
No performance numbers: no partner-count-deployed-to-date, no data-pull volume, no end-to-end latency, no cost comparison vs SFTP/CSV path. The 1.5 FTE/month number is the only hard disclosure and it is pre-state (cost of not having Delta Sharing), not post-state measurement.
-
OIDC federation is future work. Current production path is token-based auth via activation links + credential files. OIDC federation is framed as "looking into", not deployed.
-
Pilot phase scope unclear. How many partners are live on the Delta Sharing path today vs still on legacy SFTP/CSV/API is not disclosed. The post is framed as programme-design-rationale, not a rollout-status update.
-
Tier-2 Zalando + on-scope. This is a clear architecture narrative with real scale numbers + deliberate build-vs-buy analysis + cross-team collaboration disclosure + named lessons learned. Passes scope-filter decisively even though it's an introduction post.
Source¶
- Original: https://engineering.zalando.com/posts/2025/07/direct-data-sharing-using-delta-sharing.html
- Raw markdown:
raw/zalando/2025-07-07-direct-data-sharing-using-delta-sharing-introduction-our-jou-06ecd546.md
Related¶
- systems/delta-sharing — the exchange protocol chosen.
- systems/databricks — managed-service vendor.
- systems/unity-catalog — governance plane paired with Delta Sharing.
- systems/delta-lake — underlying table format being shared.
- systems/zalando-partner-data-sharing-platform — the Zalando platform built on top of Delta Sharing, evolving from Partner Tech pilot to company-wide service.
- concepts/zero-copy-data-sharing-protocol — canonical property that differentiates Delta Sharing from SFTP/CSV pipelines.
- concepts/activation-link-credential-bootstrap — partner onboarding mechanism.
- concepts/segmented-partner-data-access-tiering — the three-tier partner framing that motivates picking a protocol with a broad client ecosystem.
- patterns/open-protocol-over-proprietary-exchange — canonical architectural axis.
- patterns/recipient-per-partner-share-per-dataset-group — five-step deployment primitive.
- patterns/pilot-to-platform-via-internal-demand — organisational evolution pattern.
- patterns/managed-services-over-custom-ml-platform — generalised build-vs-buy framing applied here to partner-data infra.
- companies/zalando — context / Partner Tech division.