Skip to content

Zalando

What they do

Zalando SE is Europe's largest online fashion and lifestyle platform, headquartered in Berlin. Beyond commerce, Zalando is known in the systems community for heavy open-source contribution around Postgres-on-Kubernetes and for in-house experimentation platform work:

Zalando Engineering (engineering.zalando.com) is a Tier-2 source on the sysdesign-wiki: consistent output on distributed systems, Kubernetes, Postgres internals, cloud platform engineering, and experimentation / A/B-testing infrastructure, though the blog mixes in recruiting and product-announcement posts.

Wiki anchor axes

Zalando has nine canonical axes on the wiki so far:

  1. Postgres-on-Kubernetes / kernel-level latency (opened 2020-06-23) — empirical kernel-level measurement (perf, softirq tracepoints, network namespaces) combined with operator-level deployment pragmatism.
  2. Experimentation platform evolution (opened 2021-01-11) — Octopus + the org-level crawl/walk/run journey structured around concepts/experimentation-evolution-model-fabijan; statistical trustworthiness via concepts/sample-ratio-mismatch auto-alerts, A/B-test design audit, OEC guidance, quasi-experimental methods.
  3. Mobile testing discipline / open-source Swift tooling (opened 2021-02-01) — Vijaya Kandel's Randomizer open-source Swift library for randomised-input testing in iOS codebases. Entry-level property-based testing at mobile altitude; names the example-based constant-input antipattern and provides a type-class- driven random generator as the fix.
  4. JVM integration testing discipline / Testcontainers idiom (opened 2021-02-24) — Zalando Marketing Services's Java backend team documents the canonical application-developer altitude Testcontainers setup: singleton PostgreSQLContainer on an AbstractIntegrationTest base class, @DynamicPropertySource wiring, Maven Surefire
  5. Failsafe phase split, ~4 s / ~0.4 s / ~20 s startup numbers quoted, and the explicit caveat that Testcontainers alone doesn't catch real-API drift → concepts/contract-testing (Spring Cloud Contract). Pairs with axis 3 at a complementary JVM-backend altitude; the two 2021 posts form a Zalando testing-discipline sub-axis.
  6. Cyber-Week prep / load-test automation (opened 2020-10-07, extended 2021-03-01) — two-part narrative from the Payments + SRE orgs. (a) The 2020 retrospective (sources/2020-10-07-zalando-how-zalando-prepares-for-cyber-week) establishes patterns/live-load-test-in-production and patterns/annual-peak-event-as-capability-forcing-function plus concepts/adaptive-paging as the organisational pattern. (b) The 2021 follow-up (sources/2021-03-01-zalando-building-an-end-to-end-load-test-automation-system-on-top-of-kubernetes) opens the pre-prod break-things complement: the Load Test Conductor Go microservice exposes a declarative load-test API inspired by Kubernetes, drives Locust workers via a KPI-closed-loop orders-per-minute algorithm, and uses Skipper's header-based routing to switch each request between the real dependency and a Hoverfly mock. Ties Zalando's load-testing discipline across live-prod and break-things- pre-prod into a single capability stack.
  7. Unified GraphQL BFF / API platform (opened 2021-03-03) — Rajesh Jain's retrospective on Zalando's migration from per-surface BFFs (adopted in 2015 alongside microservices) to a single-service Unified Backend-For-Frontends GraphQL owning one schema across 12+ domains (in production since end of 2018; sources/2021-03-03-zalando-how-we-use-graphql-at-europes-largest-fashion-e-commerce-company). The axis names five BFF pathologies rooted in Conway's Law (including the delivery-window inconsistency worked example), runs on Zalando's in-house open-source graphql-jit JIT executor, enforces a No Business Logic principle in the GraphQL layer, and uses per-platform deployment bulkhead (separate Web / mobile-App deployments of the same service) for fault isolation. Positions Zalando's single-service unified-graph choice explicitly against Apollo Federation.
  8. Frontend platform evolution / micro-frontends → entity-based composition (opened 2021-03-10) — Part 1 of a series on Zalando's second-generation frontend platform Interface Framework (IF; designed 2018, ~90% traffic by March 2021), which replaces the 2015-era Project Mosaic Fragment-based micro-frontend architecture with an entity- based page-composition model: pages are request-time trees of typed Entities (Product, Collection, Outfit) chosen by personalisation, and [[patterns/entity-to- renderer-mapping|Renderers]] (one-per-Entity-type React components) are the contribution unit. The Rendering Engine (Node.js + browser runtime) walks the tree, applies declarative rendering rules, and composes the output. Cross-cutting concerns (monitoring, consent, A/B testing via Octopus, design system, bundle-size optimisation) move into the platform, and every PR is gated by Lighthouse CI + Bundle Size Limits + Web Vitals. The axis pairs tightly with axis 6 (the GraphQL BFF is IF's data aggregation layer).
  9. JVM language governance / Kotlin ADOPT ring + backend stack blueprint (opened 2021-06-30) — Zalando's Tech Radar promotes Kotlin TRIAL → ADOPT (100+ new apps / year, 250+-member Kotlin Guild), and the promotion post publishes the default Kotlin backend-service stack: Spring Boot on Kubernetes with Gradle (Kotlin DSL)
  10. Ktlint; OpenAPI contracts linted by Zally on every build as the API- first gate; Skipper OAuth filters / Route Groups / Fabric Gateway as the three-tier AuthN/AuthZ choice; Lettuce for Redis; Spring Data JPA or jOOQ for RDBMS; opentracing-toolbox (with a Kotlin submodule) for tracing. The axis canonicalises repository-template nudges over mandates as the Zalando consistency discipline — patterns/template-project-nudges-consistency. Ktor
  11. GraalVM flagged as ASSESS-ring successor. Canonical wiki source for "what does an ADOPT-ring promotion look like operationally."
  12. MDM / knowledge-graph-driven data modeling (opened 2021-07-28) — Zalando's in-house MDM component uses a knowledge graph in Neo4j as the design-time authoring substrate from which both the logical data model of the golden record and the per-source transformation data model are generated via a Python script (sources/2021-07-28-zalando-knowledge-graph-technologies-accelerate-and-improve-the-data-model-definition). Domain experts author column → Concept / Attribute / Relationship mappings (direct 1-to-1 or indirect 1-to-many with a transformation function); a semantic layer of business concepts sits between source schemas and the golden-record schema. Knowledge- graph-for-MDM-modeling is the core pattern; mapping- driven schema generation is the generalised pattern; [[patterns/visual-graph-for-business-engineering- alignment|visual-graph-for-business-engineering- alignment]] — using Neo4j-rendered diagrams as the primary business-engineering communication artifact — is named by the post as the #1 benefit ("The dialogue between business and technology in designing the golden record logical data model has improved and accelerated the process of creating a correct model"). Consolidated- style MDM; scoped to "tens of tables and hundreds of columns". Data lineage falls out of the graph as a side-effect capability. Opens Zalando's third canonical knowledge-graph wiki instance alongside Dropbox Dash (retrieval substrate) and Netflix UDA (enterprise integration substrate).

Key systems

  • systems/zalando-postgres-operator — the flagship Kubernetes operator; release 1.5 (2020) introduced the built-in PgBouncer connection-pooling feature that motivates the first ingest.
  • systems/skipper-proxy — Go HTTP router / reverse proxy, default Kubernetes Ingress across 140+ clusters; reused to serve engineering.zalando.com via a single route annotation (compress() + setDynamicBackendUrl) that proxies to an S3 website endpoint.
  • systems/kube-ingress-aws-controller — auto-provisions AWS ALB + ACM cert per Ingress.
  • systems/external-dns — SIG Kubernetes controller used in combination with the above for end-to-end Ingress → ALB → DNS automation.
  • systems/octopus-zalando-experimentation-platform — in-house A/B testing platform; 2015–present; three subsystems (management, execution, analysis); analysis rebuilt on systems/apache-spark.
  • systems/randomizer-swift — open-source Swift library for randomised-input testing. Random protocol + Standard Library conformances + user-type extension point. Authored by Vijaya Kandel (Zalando Mobile, iOS). Used inside Zalando's iOS codebase.
  • systems/testcontainers — Zalando Marketing Services canonicalises the JVM / Java / Spring Boot altitude use pattern: singleton PostgreSQLContainer on a base class, @DynamicPropertySource-wired into Spring, amortised across all ITs in the JVM. Complements the existing Canva CI-framework altitude Seen-in.
  • systems/localstack · systems/mockserver · systems/wiremock · systems/ryuk-testcontainers-reaper — companion containers called out in the ZMS post.
  • systems/junit5 · systems/maven-surefire-plugin · systems/maven-failsafe-plugin · systems/spring-boot — the JVM test stack Zalando ZMS plugs Testcontainers into.
  • systems/zalando-load-test-conductor — Go microservice built by the Payments department (2021-03) to own the full lifecycle of an end-to-end load test: production-version cloning, multi-substrate scaling (Kubernetes + AWS ECS), KPI-closed-loop Locust steering, scale-down, and data cleanup. Exposes a declarative Kubernetes-inspired API; invoked both manually and via Kubernetes CronJob.
  • systems/locust · systems/hoverfly — the Payments department's chosen open-source traffic generator and API mocking tool. Locust over Vegeta / JMeter on developer- familiarity; Hoverfly over Wiremock / MockServer on record-and-replay + stateful behaviour + language-agnostic deployment.
  • systems/nakadi — Zalando's open-source event bus (Kafka wrapper with REST + schema registry); named in the 2021-03 post as a centrally-managed event queue whose test-cluster parity required cross-team alignment.
  • systems/opentracing — the distributed tracing substrate powering concepts/adaptive-paging (from the 2020-10 Cyber-Week retrospective).
  • systems/zalando-graphql-ubff — Zalando's single-service Unified Backend-For-Frontends GraphQL; in production since end of 2018; 12+ domains, 200+ consuming developers, 25-30 feature teams; >80% Web / >50% App coverage.
  • systems/graphql-jit — Zalando's in-house open-source JIT-compiled GraphQL executor (zalando-incubator/graphql-jit), the execution engine the UBFF runs on.
  • systems/graphql — the query-language substrate.
  • systems/zalando-interface-framework — second-generation frontend platform; designed 2018, ~90% of zalando.com traffic by March 2021. Supersedes the Mosaic Fragment architecture with an entity-based page-composition model.
  • systems/zalando-mosaic — the 2015-era Fragment-based micro-frontend architecture Zalando retrospectively critiques; retained via hybrid Rendering-Engine modes during the migration to IF.
  • systems/zalando-rendering-engine — the Node.js + browser runtime at the heart of IF; resolves Entity trees into Renderer trees using declarative rendering rules.
  • systems/zally — Zalando's open-source OpenAPI linter that codifies the RESTful API Guidelines. MUST-severity rules gate CI builds — the enforcement point of API-first at Zalando.
  • systems/fabric-gateway-zalando — Zalando's declarative Kubernetes API gateway built on top of Skipper; one of the three default AuthN/AuthZ options for new backend services.
  • systems/opentracing-toolbox — Zalando's Java/Kotlin integration library for OpenTracing; named as the Kotlin Guild's default tracing library with a dedicated Kotlin submodule.
  • systems/zalando-mdm-system — Zalando's in-house Master Data Management component (in-design as of mid-2021). Uses a knowledge graph in Neo4j as the design- time authoring substrate; Python script walks the graph to generate the logical data model of the golden record plus per-source-system transformation data models. Consolidated- style MDM scoped to "tens of tables and hundreds of columns".
  • systems/neo4j — the property-graph database used as the knowledge-graph store and visualisation tool for the MDM modeling work. Chosen explicitly for "best look-and-feel" / domain-expert communication, not query-path semantics.

Key concepts and patterns surfaced

Postgres-on-Kubernetes / kernel-level latency axis:

Experimentation-platform axis:

Mobile testing discipline axis:

JVM integration testing discipline axis:

Cyber-Week prep / load-test automation axis:

Unified GraphQL BFF / API platform axis:

JVM language governance / Kotlin ADOPT ring axis:

MDM / knowledge-graph-driven data modeling axis:

  • concepts/master-data-management — the enclosing discipline; "technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets." Zalando chose consolidated-style MDM.
  • concepts/golden-record"a common, shared, and trusted view on data for a particular domain"; the output of consolidation over source systems.
  • concepts/logical-data-model — the schema of the golden record; generated from the knowledge graph rather than authored directly.
  • concepts/transformation-data-model — per-source-system mapping showing direct (1-to-1) vs. indirect (1-to-many, transformation-function) column → concept mappings. The worked System A / System B Address example (free-text address lines vs. structured street / zip / city / country_code fields) illustrates both mapping types.
  • concepts/semantic-layer-of-business-concepts — the graph of Concept / Attribute / Relationship nodes between source schemas and the target logical data model; the "shared conceptual vocabulary" that makes business- engineering alignment tractable.
  • concepts/knowledge-graph — extended with Zalando MDM as its third canonical wiki instance (alongside Dropbox Dash retrieval substrate and Netflix UDA enterprise-data- integration substrate). A new H2 in the knowledge-graph concept page contrasts the three altitudes.
  • concepts/data-lineage — extended with Zalando MDM as a design-time byproduct Seen-in, complementary to the existing Meta (enforcement-primitive) and Redpanda (agent-interaction envelope) framings.
  • patterns/knowledge-graph-for-mdm-modeling — the core pattern; System / Table / Column / Concept / Attribute / Relationship node schema, Python generator, direct vs. indirect mappings.
  • patterns/mapping-driven-schema-generation — the generalised pattern across MDM, Netflix UDA, and dbt-style data-build tools: make the mapping authoritative and derive both target schema and transformation code.
  • patterns/visual-graph-for-business-engineering-alignment — Neo4j-rendered graph diagrams as the primary business- engineering communication artifact, replacing SQL DDL / spreadsheets. Named by the post as the #1 benefit.

Recent articles

Last updated · 476 distilled / 1,218 read