Skip to content

ZALANDO 2025-06-29

Read original ↗

Zalando — Building a Dynamic Inventory Optimisation System: A Deep Dive

Summary

Zalando (2025-06-29) documents the architecture of ZEOS's AI-driven replenishment-recommendation system — a two-stage machine-learning pipeline that produces probabilistic weekly demand forecasts for 5 million SKUs and feeds them into a Monte Carlo simulation + gradient-free black-box optimiser that recommends what / when / where to replenish across a multi-echelon warehouse network. The system is offered to partners via the Zalando B2B partner portal, which exposes both daily batch recommendations and a real-time interactive optimisation endpoint so partners can adjust inventory settings and re-score on the fly. The post cleanly separates two concerns most MLE posts conflate: (a) a Demand Forecaster pipeline that runs weekly and produces a 12-week-ahead probabilistic forecast per (article_id, merchant_id, week), and (b) an Inventory Optimiser pipeline that runs daily in batch and on-demand online — both using the same optimisation algorithm and the same input feature vectors for consistency. Both pipelines are built on zFlow (Zalando's internal ML platform, which compiles Python workflows to AWS Step Functions via CDK + CloudFormation), with feature engineering split between Databricks + PySpark (data pre-processing over Delta Lake on transient job clusters) and [[systems/aws-sagemaker-ai|SageMaker] Processing Jobs (vectorised feature transformation). Training and inference for LightGBM-based forecasts run in a single SageMaker Training Job — no separate inference infra, no checkpointing — because the model is lightweight enough. The core cost / latency claim is under 2 hours end-to-end for the weekly forecast pipeline (3 years of history × 5M SKUs, sliding window) via "deliberate focus on data model design and I/O efficiency." The inventory-optimisation pipeline uses a SageMaker Feature Store in both its online and offline modes — offline (S3-backed, append mode) for batch pipelines + long-term retention + debugging; online (low-latency, 10–20 ms read/write per SKU) for interactive use. Online requests flow through SQS → AWS Lambda where a worker pool executes the optimiser with multi-threading parallelism per inventory-setting update. Daily batch recommendations are computed via SageMaker Batch Transform followed by a SageMaker Processing Job that evaluates optimisation performance — proactive drift monitoring via AWS CloudWatch alarms + Lambda. An interesting architectural commitment: the same algorithm + feature vectors are used on both the online and offline paths, so partners' ad-hoc what-if queries and the daily batch report always agree.

Key takeaways

  1. Replenishment framed as cost optimisation under uncertainty. Objective:

$$Min\ Costs(\theta) = C_{storage}(\theta) + C_{lost\ sales}(\theta) + C_{overstock}(\theta) + C_{operations}(\theta) + C_{inbound}(\theta)$$

Balance: "long-term cost of overstock with the short-term cost of lost sales" while satisfying operational constraints (lead times, review frequency) and capturing "the stochastic nature of the decision-making process." Solved via Monte Carlo simulations + black-box gradient-free optimisers rather than closed-form gradient descent.

  1. Two-stage pipeline, rigidly separated. Step 1 gathers inputs (probabilistic demand forecasts, returns lead-time forecasts, shipment lead times, per-item economics, latest stock state, stock in transit); step 2 feeds all inputs into the recommendation engine. "We break the inventory optimisation problem into two isolated but connected building blocks: Demand Forecast and Inventory Optimisation." Canonical instance of patterns/two-stage-forecast-plus-optimisation-pipeline.

  2. Demand forecasting scale: 5M SKUs, 3 years of history, under 2 hours. "Our weekly forecasting pipeline processes 3 years of historical data for 5 million SKUs (size and colour) using a sliding window approach, and takes less than 2 hours. This high performance pipeline is enabled by a deliberate focus on data model design and I/O efficiency." Canonical numbers on the wiki for probabilistic-forecast batch-size.

  3. Feature engineering split: pre-processing vs transformation. Two explicitly-named stages with different tooling:

Data Pre-Processing Data Transformation
Objective Model upstream data into a human-understandable time-series representation Engineer features that maximise predictive signals for model training
Example ops Joins, filters, aggregations Encoding, normalisation
Tools PySpark, Spark-SQL on Databricks over Delta Lake / transient job clusters Pandas, Scikit-learn, NumPy, Numba in SageMaker Processing Job
Scaling Horizontal (add worker nodes as volume grows) Vertical (dependent libraries lack native distribution)
Why it's fast Distributed processing; avoids complex statistical feature engineering Operates on pre-processed data, not raw events

This taxonomy is load- bearing architectural vocabulary for the whole system — forecasting-specific feature generation (target lags / transformations, exogenous features, temporal features) is handled by Nixtla's MLForecast with Numba under the hood, not by the two tiers above.

  1. Model: LightGBM + Nixtla MLForecast, not deep learning. After "extensive experimentation with deep learning models like TFT and other machine learning approaches," Zalando settled on LightGBM via Nixtla's MLForecast. Rationale verbatim: "high-level abstractions for time series-specific feature generation with optimised performance, rapid prototyping through shorter feedback loops, and access to a robust, well-maintained open-source ecosystem." Architecture payoff: "Due to the ML model's lightweight training footprint, we bypass complexity, like for example not needing checkpointing, or separate infrastructure for inference. Instead, model training as well as model inference are executed in a single pipeline using AWS SageMaker Training Jobs." Canonical wiki instance of patterns/single-sagemaker-training-job-train-and-infer.

  2. Output cardinality: (article_id, merchant_id, week) × 12 weeks, probabilistic. "The final output of this stage is a 12-week probabilistic demand forecast for each (article_id, merchant_id, week) combination." Each time series is keyed by the composite key — one merchant selling the same article gets its own series. See concepts/skus-as-time-series-unit.

  3. Drift monitoring via CloudWatch + Lambda. Post-processing is a SageMaker Processing Job that computes "statistical analysis of model performance and the computation of key business metrics"; these metrics flow into AWS CloudWatch alarms + Lambda functions that deliver alerts to operator channels. First-class concepts/model-drift-monitoring pattern — monitoring is not bolted on, it is a pipeline step.

  4. Feature vector composition (inventory side). Each SKU gets a feature vector containing: historical outbound data, inventory state, inbound volumes, pricing information, article metadata, cost factors, return lead-time weights, and 12-week probabilistic demand forecasts. This vector is the boundary object between the two pipelines.

  5. Dual-mode feature store. Same vectors, two stores:

  6. Offline: S3-backed, append mode; intended for cold storage, batch pipelines, archiving, debugging; latency "in the order of minutes." Stores daily datapoints and updated vectors resulting from inventory-setting changes (so long-term audit is complete).
  7. Online: optimised for low-latency, low-throughput lookup of only the latest valid feature vector; 10–20 ms read/write per SKU; serves both batch input generation and the online interactive endpoint.

See concepts/online-vs-offline-feature-store and patterns/online-plus-offline-feature-store-parity.

  1. Online and offline deliveries share the same algorithm and features. "It's important to note that the inventory optimisation algorithm and input features are synchronised between the two subsystems (online and offline), ensuring consistency across both engines." Zalando treats this as a correctness-level invariant — partners' interactive recommendations and their daily batch report must never disagree for the same inputs.

  2. Offline delivery: SageMaker Batch Transform + Processing Job. Daily recommendation reports run via SageMaker Batch Transform using the latest inventory setting for all merchants and articles; followed by a post-processing layer in SageMaker Processing Job that evaluates optimisation performance. Results land in S3 and a "report generated" notification is published to the respective event stream.

  3. Online delivery: SQS → Lambda → multi-threaded optimiser. "When partners update their inventory settings, we trigger an orchestrated workflow that queues each update request on AWS SQS. We then use AWS Lambda to poll the queue for updates and serve each update request asynchronously. For each inventory update, we fetch the feature vector for relevant SKUs from the online feature store, and execute the optimisation algorithm with multi-threading parallelism." Canonical patterns/async-sqs-lambda-for-interactive-optimisation instance: optimisation output lands in S3 + event-stream notification; the inventory-setting update is also written back to the offline feature store so future offline predictions remain consistent with the online what-if. See concepts/single-pipeline-train-and-infer sibling patterns/proactive-cache-daily-batch-prediction.

  4. Three named scalability levers (verbatim). (1) Robust Pipelines: dedicated Databricks Job clusters + SageMaker processing / training jobs per run — "a failure of one execution in the Databricks job cluster does not impact a parallel execution." (2) Fast data and vectorised transformations: PySpark + Numba + Joblib multi-core; Numba speedups "by a factor of 2 or 3 compared to NumPy." (3) Light models: LightGBM with conformal inference via Nixtla MLForecast — "the benefits of using a library like Nixtla, which can automate many time series features and processes required just before training."

Architecture

Pipeline 1 — Demand forecaster (weekly)

Flow (right-to-left per the post's Figure 2):

  1. Feature Engineering — data pre-processing (PySpark on Databricks transient job clusters, writing to Delta Lake; 2.5-year timeframe to capture seasonal patterns without overemphasising old data) followed by data transformation (SageMaker Processing Job; Pandas / scikit-learn / NumPy / Numba). Forecasting-specific features (target lags, exogenous lags, temporal features) are handed off to MLForecast.

  2. Model Training + Predictions — single SageMaker Training Job runs LightGBM via Nixtla MLForecast; produces a 12-week probabilistic forecast per (article_id, merchant_id, week). No checkpointing, no separate inference infrastructure.

  3. Post Processing — SageMaker Processing Job builds a time-series representation suitable for the downstream optimiser, plus statistical model-performance analysis and key business-metric computation. CloudWatch alarms + Lambda route alerts.

Wall-clock: under 2 hours end-to-end for 3 years × 5M SKUs via sliding window.

Pipeline 2 — Inventory optimiser (daily batch + online)

  1. Feature Generation — same two-tier split (PySpark on Databricks for SQL-expressible transforms; SciPy / NumPy operations in SageMaker Processing Job). Output: per-SKU feature vector with historical outbound, inventory state, inbound volumes, pricing, article metadata, cost factors, return lead-time weights, and the 12-week probabilistic demand forecasts from pipeline 1.

  2. Feature Store SageMaker Feature Store; both online and offline modes; offline append-only (daily + user-triggered updates) on S3; online ≤ 20 ms latency per SKU.

  3. Optimisation — Monte Carlo simulation + black-box gradient-free optimiser; one algorithm, two delivery mechanisms.

  4. Offline (batch): daily SageMaker Batch Transform over all (merchant, article) pairs; SageMaker Processing Job for post-processing + performance evaluation; S3 output + "report generated" event.

  5. Online (interactive): partner portal → SQS → Lambda worker pool → multi-threaded optimiser per update → S3 output + backend event-stream notification; update is also persisted to the offline feature store for future consistency.

Platform substrate

Both pipelines run on zFlow, Zalando's internal ML platform. zFlow provides:

  • Seamless AWS + Databricks integration — removes infrastructure-code overhead so teams can focus on ML-application logic.
  • In-transit and at-rest encryption for all artifacts by default.
  • Orchestration via AWS Step Functions (zFlow compiles Python DSL → Step Functions state machine via AWS CDK-generated CloudFormation).

Canonical earlier disclosure at sources/2022-04-18-zalando-zalandos-machine-learning-platform and sources/2021-02-15-zalando-a-machine-learning-pipeline-with-real-time-inference.

Numbers

Quantity Value Source
SKUs in the weekly forecast 5 million post
Historical data window 3 years (sliding window); 2.5-year timeframe used for seasonal capture post
Forecast horizon 12 weeks per (article_id, merchant_id, week) post
Forecast distribution Probabilistic (not point) post
Forecast pipeline wall-clock < 2 hours end-to-end post
Online feature store latency 10–20 ms per SKU, read/write post
Offline feature store latency "in the order of minutes" post
Numba speedup vs NumPy "2–3×" post
Forecast cadence Weekly post
Optimisation cadence Daily batch + real-time online post
Catalogue complexity (problem frame) Up to millions of articles; dozens of warehouses across multiple countries; seasonal rotating catalogue post

Extracted systems

Extracted concepts

Extracted patterns

Caveats / gaps

  • Partner portal technology stack unspecified. The post says partners get a "partner portal" giving a "holistic picture of inventory health and other metrics and KPIs" but does not disclose its underlying serving framework, caching tier, auth model, or latency numbers — only the online feature-store number (10–20 ms per SKU).
  • Black-box optimiser family not named. The post says "black-box gradient-free optimisers" without naming the algorithm family (CMA-ES, Bayesian optimisation, simulated annealing, Powell's method, etc.); evaluation budget per SKU is not disclosed either.
  • Monte Carlo sample size not disclosed. The objective cites Monte Carlo simulations, but the number of samples per θ evaluation is not quoted — a core cost lever for the daily batch run.
  • Warehouse multi-echelon logic not drilled into. The post mentions "dozens of warehouses spread across several countries" as part of the problem, but the output from the optimiser regarding where inventory should be allocated across that network is not detailed — only the high-level problem framing.
  • Feature-vector size (bytes) not disclosed. Online store latency per SKU is quoted as 10–20 ms — reasonable for a low-throughput KV store — but without feature-vector size or number of SKUs per interactive request, throughput calculations are not possible.
  • 12-week forecast wall-clock not decomposed. The post's "< 2 hours" figure is the full pipeline; no breakdown between pre-processing, transformation, training, and post-processing is given.
  • Conformal inference is mentioned but not explained. "Nixtla's MLForecast with conformal inference, and LightGBM for probabilistic forecasts" — the interaction between LightGBM point forecasts and conformal-based prediction intervals is not elaborated.
  • No mention of cost. Unlike the 2022 ML Platform post which calls out "low total cost of ownership," this post does not quote AWS bill figures for the 5M-SKU weekly run or the daily batch run.
  • Backfill / retraining strategy. Retraining cadence is weekly (full retrain) per the forecast pipeline; there's no mention of online-learning or incremental retraining strategies. Presumably full retrain every week is acceptable because the pipeline is under 2 hours.
  • Failure mode of the online path. If SQS backs up or Lambda concurrency hits a ceiling, the post doesn't describe fallback UX — does the partner portal fall back to the cached daily batch answer, or does it surface an error? Not addressed.

Wiki positioning

Source

Last updated · 501 distilled / 1,218 read