ZALANDO 2025-06-29

Zalando — Building a Dynamic Inventory Optimisation System: A Deep Dive¶

Summary¶

Zalando (2025-06-29) documents the architecture of ZEOS's AI-driven replenishment-recommendation system — a two-stage machine-learning pipeline that produces probabilistic weekly demand forecasts for 5 million SKUs and feeds them into a Monte Carlo simulation + gradient-free black-box optimiser that recommends what / when / where to replenish across a multi-echelon warehouse network. The system is offered to partners via the Zalando B2B partner portal, which exposes both daily batch recommendations and a real-time interactive optimisation endpoint so partners can adjust inventory settings and re-score on the fly. The post cleanly separates two concerns most MLE posts conflate: (a) a Demand Forecaster pipeline that runs weekly and produces a 12-week-ahead probabilistic forecast per (article_id, merchant_id, week), and (b) an Inventory Optimiser pipeline that runs daily in batch and on-demand online — both using the same optimisation algorithm and the same input feature vectors for consistency. Both pipelines are built on zFlow (Zalando's internal ML platform, which compiles Python workflows to AWS Step Functions via CDK + CloudFormation), with feature engineering split between Databricks + PySpark (data pre-processing over Delta Lake on transient job clusters) and [[systems/aws-sagemaker-ai|SageMaker] Processing Jobs (vectorised feature transformation). Training and inference for LightGBM-based forecasts run in a single SageMaker Training Job — no separate inference infra, no checkpointing — because the model is lightweight enough. The core cost / latency claim is under 2 hours end-to-end for the weekly forecast pipeline (3 years of history × 5M SKUs, sliding window) via "deliberate focus on data model design and I/O efficiency." The inventory-optimisation pipeline uses a SageMaker Feature Store in both its online and offline modes — offline (S3-backed, append mode) for batch pipelines + long-term retention + debugging; online (low-latency, 10–20 ms read/write per SKU) for interactive use. Online requests flow through SQS → AWS Lambda where a worker pool executes the optimiser with multi-threading parallelism per inventory-setting update. Daily batch recommendations are computed via SageMaker Batch Transform followed by a SageMaker Processing Job that evaluates optimisation performance — proactive drift monitoring via AWS CloudWatch alarms + Lambda. An interesting architectural commitment: the same algorithm + feature vectors are used on both the online and offline paths, so partners' ad-hoc what-if queries and the daily batch report always agree.

Key takeaways¶

Replenishment framed as cost optimisation under uncertainty. Objective:

$$Min\ Costs(\theta) = C_{storage}(\theta) + C_{lost\ sales}(\theta) + C_{overstock}(\theta) + C_{operations}(\theta) + C_{inbound}(\theta)$$

Balance: "long-term cost of overstock with the short-term cost of lost sales" while satisfying operational constraints (lead times, review frequency) and capturing "the stochastic nature of the decision-making process." Solved via Monte Carlo simulations + black-box gradient-free optimisers rather than closed-form gradient descent.

Two-stage pipeline, rigidly separated. Step 1 gathers inputs (probabilistic demand forecasts, returns lead-time forecasts, shipment lead times, per-item economics, latest stock state, stock in transit); step 2 feeds all inputs into the recommendation engine. "We break the inventory optimisation problem into two isolated but connected building blocks: Demand Forecast and Inventory Optimisation." Canonical instance of patterns/two-stage-forecast-plus-optimisation-pipeline.
Demand forecasting scale: 5M SKUs, 3 years of history, under 2 hours. "Our weekly forecasting pipeline processes 3 years of historical data for 5 million SKUs (size and colour) using a sliding window approach, and takes less than 2 hours. This high performance pipeline is enabled by a deliberate focus on data model design and I/O efficiency." Canonical numbers on the wiki for probabilistic-forecast batch-size.
Feature engineering split: pre-processing vs transformation. Two explicitly-named stages with different tooling:

	Data Pre-Processing	Data Transformation
Objective	Model upstream data into a human-understandable time-series representation	Engineer features that maximise predictive signals for model training
Example ops	Joins, filters, aggregations	Encoding, normalisation
Tools	PySpark, Spark-SQL on Databricks over Delta Lake / transient job clusters	Pandas, Scikit-learn, NumPy, Numba in SageMaker Processing Job
Scaling	Horizontal (add worker nodes as volume grows)	Vertical (dependent libraries lack native distribution)
Why it's fast	Distributed processing; avoids complex statistical feature engineering	Operates on pre-processed data, not raw events

This taxonomy is load- bearing architectural vocabulary for the whole system — forecasting-specific feature generation (target lags / transformations, exogenous features, temporal features) is handled by Nixtla's MLForecast with Numba under the hood, not by the two tiers above.

Model: LightGBM + Nixtla MLForecast, not deep learning. After "extensive experimentation with deep learning models like TFT and other machine learning approaches," Zalando settled on LightGBM via Nixtla's MLForecast. Rationale verbatim: "high-level abstractions for time series-specific feature generation with optimised performance, rapid prototyping through shorter feedback loops, and access to a robust, well-maintained open-source ecosystem." Architecture payoff: "Due to the ML model's lightweight training footprint, we bypass complexity, like for example not needing checkpointing, or separate infrastructure for inference. Instead, model training as well as model inference are executed in a single pipeline using AWS SageMaker Training Jobs." Canonical wiki instance of patterns/single-sagemaker-training-job-train-and-infer.
Output cardinality: (article_id, merchant_id, week) × 12 weeks, probabilistic. "The final output of this stage is a 12-week probabilistic demand forecast for each (article_id, merchant_id, week) combination." Each time series is keyed by the composite key — one merchant selling the same article gets its own series. See concepts/skus-as-time-series-unit.
Drift monitoring via CloudWatch + Lambda. Post-processing is a SageMaker Processing Job that computes "statistical analysis of model performance and the computation of key business metrics"; these metrics flow into AWS CloudWatch alarms + Lambda functions that deliver alerts to operator channels. First-class concepts/model-drift-monitoring pattern — monitoring is not bolted on, it is a pipeline step.
Feature vector composition (inventory side). Each SKU gets a feature vector containing: historical outbound data, inventory state, inbound volumes, pricing information, article metadata, cost factors, return lead-time weights, and 12-week probabilistic demand forecasts. This vector is the boundary object between the two pipelines.
Dual-mode feature store. Same vectors, two stores:
Offline: S3-backed, append mode; intended for cold storage, batch pipelines, archiving, debugging; latency "in the order of minutes." Stores daily datapoints and updated vectors resulting from inventory-setting changes (so long-term audit is complete).
Online: optimised for low-latency, low-throughput lookup of only the latest valid feature vector; 10–20 ms read/write per SKU; serves both batch input generation and the online interactive endpoint.

See concepts/online-vs-offline-feature-store and patterns/online-plus-offline-feature-store-parity.

Online and offline deliveries share the same algorithm and features. "It's important to note that the inventory optimisation algorithm and input features are synchronised between the two subsystems (online and offline), ensuring consistency across both engines." Zalando treats this as a correctness-level invariant — partners' interactive recommendations and their daily batch report must never disagree for the same inputs.
Offline delivery: SageMaker Batch Transform + Processing Job. Daily recommendation reports run via SageMaker Batch Transform using the latest inventory setting for all merchants and articles; followed by a post-processing layer in SageMaker Processing Job that evaluates optimisation performance. Results land in S3 and a "report generated" notification is published to the respective event stream.
Online delivery: SQS → Lambda → multi-threaded optimiser. "When partners update their inventory settings, we trigger an orchestrated workflow that queues each update request on AWS SQS. We then use AWS Lambda to poll the queue for updates and serve each update request asynchronously. For each inventory update, we fetch the feature vector for relevant SKUs from the online feature store, and execute the optimisation algorithm with multi-threading parallelism." Canonical patterns/async-sqs-lambda-for-interactive-optimisation instance: optimisation output lands in S3 + event-stream notification; the inventory-setting update is also written back to the offline feature store so future offline predictions remain consistent with the online what-if. See concepts/single-pipeline-train-and-infer sibling patterns/proactive-cache-daily-batch-prediction.
Three named scalability levers (verbatim). (1) Robust Pipelines: dedicated Databricks Job clusters + SageMaker processing / training jobs per run — "a failure of one execution in the Databricks job cluster does not impact a parallel execution." (2) Fast data and vectorised transformations: PySpark + Numba + Joblib multi-core; Numba speedups "by a factor of 2 or 3 compared to NumPy." (3) Light models: LightGBM with conformal inference via Nixtla MLForecast — "the benefits of using a library like Nixtla, which can automate many time series features and processes required just before training."

Architecture¶

Pipeline 1 — Demand forecaster (weekly)¶

Flow (right-to-left per the post's Figure 2):

Feature Engineering — data pre-processing (PySpark on Databricks transient job clusters, writing to Delta Lake; 2.5-year timeframe to capture seasonal patterns without overemphasising old data) followed by data transformation (SageMaker Processing Job; Pandas / scikit-learn / NumPy / Numba). Forecasting-specific features (target lags, exogenous lags, temporal features) are handed off to MLForecast.
Model Training + Predictions — single SageMaker Training Job runs LightGBM via Nixtla MLForecast; produces a 12-week probabilistic forecast per (article_id, merchant_id, week). No checkpointing, no separate inference infrastructure.
Post Processing — SageMaker Processing Job builds a time-series representation suitable for the downstream optimiser, plus statistical model-performance analysis and key business-metric computation. CloudWatch alarms + Lambda route alerts.

Wall-clock: under 2 hours end-to-end for 3 years × 5M SKUs via sliding window.

Pipeline 2 — Inventory optimiser (daily batch + online)¶

Feature Generation — same two-tier split (PySpark on Databricks for SQL-expressible transforms; SciPy / NumPy operations in SageMaker Processing Job). Output: per-SKU feature vector with historical outbound, inventory state, inbound volumes, pricing, article metadata, cost factors, return lead-time weights, and the 12-week probabilistic demand forecasts from pipeline 1.
Feature Store — SageMaker Feature Store; both online and offline modes; offline append-only (daily + user-triggered updates) on S3; online ≤ 20 ms latency per SKU.
Optimisation — Monte Carlo simulation + black-box gradient-free optimiser; one algorithm, two delivery mechanisms.
Offline (batch): daily SageMaker Batch Transform over all (merchant, article) pairs; SageMaker Processing Job for post-processing + performance evaluation; S3 output + "report generated" event.
Online (interactive): partner portal → SQS → Lambda worker pool → multi-threaded optimiser per update → S3 output + backend event-stream notification; update is also persisted to the offline feature store for future consistency.

Platform substrate¶

Both pipelines run on zFlow, Zalando's internal ML platform. zFlow provides:

Seamless AWS + Databricks integration — removes infrastructure-code overhead so teams can focus on ML-application logic.
In-transit and at-rest encryption for all artifacts by default.
Orchestration via AWS Step Functions (zFlow compiles Python DSL → Step Functions state machine via AWS CDK-generated CloudFormation).

Canonical earlier disclosure at sources/2022-04-18-zalando-zalandos-machine-learning-platform and .

Numbers¶

Quantity	Value	Source
SKUs in the weekly forecast	5 million	post
Historical data window	3 years (sliding window); 2.5-year timeframe used for seasonal capture	post
Forecast horizon	12 weeks per `(article_id, merchant_id, week)`	post
Forecast distribution	Probabilistic (not point)	post
Forecast pipeline wall-clock	< 2 hours end-to-end	post
Online feature store latency	10–20 ms per SKU, read/write	post
Offline feature store latency	"in the order of minutes"	post
Numba speedup vs NumPy	"2–3×"	post
Forecast cadence	Weekly	post
Optimisation cadence	Daily batch + real-time online	post
Catalogue complexity (problem frame)	Up to millions of articles; dozens of warehouses across multiple countries; seasonal rotating catalogue	post

Extracted systems¶

systems/zeos-inventory-optimisation-system — top-level named system (ZEOS replenishment recommendations).
systems/zeos-demand-forecaster — batch probabilistic forecast pipeline, weekly, 5M SKUs.
systems/zeos-replenishment-recommender — dual-mode optimiser (batch + online) fronted by the partner portal.
systems/zflow — platform substrate (existing).
systems/databricks + systems/apache-spark + systems/delta-lake — feature pre-processing tier.
systems/aws-sagemaker-ai + systems/sagemaker-processing-job + systems/sagemaker-training-job + systems/sagemaker-batch-transform-job — model layer.
systems/aws-sagemaker-feature-store — online/offline feature store.
systems/aws-step-functions — pipeline orchestrator (via zFlow compile target).
systems/aws-lambda + systems/aws-sqs — online interactive path.
systems/aws-s3 — offline feature store backing store + prediction output store.
systems/aws-cloudwatch — drift-alarm substrate.
systems/mlforecast-nixtla — time-series ML wrapper around LightGBM.
systems/numba — JIT accelerator used by Nixtla's feature engineering.

Extracted concepts¶

concepts/probabilistic-demand-forecast — forecasts emit a distribution, not a point estimate; required for Monte Carlo optimisation under uncertainty.
concepts/monte-carlo-simulation-under-uncertainty — the optimiser evaluates candidate decisions θ by Monte Carlo over the forecast distribution.
concepts/gradient-free-black-box-optimisation — Zalando chose gradient-free optimisers because the cost function is a non-differentiable simulator call.
concepts/two-stage-forecast-optimise-pipeline — forecast pipeline produces inputs; optimisation pipeline consumes them. Named at architectural level.
concepts/data-preprocessing-vs-data-transformation-split — architectural vocabulary Zalando invented for this system; maps directly to PySpark-vs-SageMaker-Processing boundary.
concepts/horizontal-vs-vertical-scalability-for-feature-engineering — pre-processing scales horizontally (add workers); transformation scales vertically (bigger box) because scikit-learn / NumPy / Numba lack native distribution.
concepts/sliding-window-training — sliding window over 3 years of history — wall-clock lever.
concepts/online-vs-offline-feature-store — the two distinct feature-store personalities and their SLA differences.
concepts/single-pipeline-train-and-infer — model is light enough that one SageMaker Training Job owns both training and inference for the batch; no separate hosting infra.
concepts/transient-job-cluster — Databricks transient job clusters per run.
concepts/proactive-cache-of-batch-predictions — daily batch computes the latest recommendations proactively so offline consumers always see fresh answers without re-requesting.
concepts/model-drift-monitoring — model-performance metrics emitted by post-processing feed CloudWatch alarms
Lambda alerts.
concepts/skus-as-time-series-unit — unique unit is (article_id, merchant_id); one series per (article, merchant).
concepts/partner-portal-interactive-planning — B2B UX shape that justifies the online/offline architecture split.

Extracted patterns¶

patterns/two-stage-forecast-plus-optimisation-pipeline — canonical pattern page.
patterns/pyspark-preprocessing-to-python-transformation-split — the explicit two-tier feature-engineering pipeline.
patterns/online-plus-offline-feature-store-parity — same vectors written to both tiers; online and offline consumers always agree.
patterns/async-sqs-lambda-for-interactive-optimisation — interactive user action → queued request → Lambda pool → optimiser with multi-threading → output + notification; side-effect write-back to offline store for parity.
patterns/proactive-cache-daily-batch-prediction — daily batch pre-computes the latest answers across all partners
articles so that offline batch consumers always see fresh output.
patterns/weekly-batch-forecast-daily-batch-optimise-cadence — forecast cadence (weekly) decouples from optimisation cadence (daily) because the forecast is the expensive upstream and the optimiser is the cheaper downstream.
patterns/single-sagemaker-training-job-train-and-infer — cost-reduction pattern: if the model is light enough, have one pipeline own both phases; no separate inference infra.
patterns/transient-databricks-cluster-per-run — per-invocation job cluster; no shared long-lived clusters; failures of one run don't cross over.

Caveats / gaps¶

Partner portal technology stack unspecified. The post says partners get a "partner portal" giving a "holistic picture of inventory health and other metrics and KPIs" but does not disclose its underlying serving framework, caching tier, auth model, or latency numbers — only the online feature-store number (10–20 ms per SKU).
Black-box optimiser family not named. The post says "black-box gradient-free optimisers" without naming the algorithm family (CMA-ES, Bayesian optimisation, simulated annealing, Powell's method, etc.); evaluation budget per SKU is not disclosed either.
Monte Carlo sample size not disclosed. The objective cites Monte Carlo simulations, but the number of samples per θ evaluation is not quoted — a core cost lever for the daily batch run.
Warehouse multi-echelon logic not drilled into. The post mentions "dozens of warehouses spread across several countries" as part of the problem, but the output from the optimiser regarding where inventory should be allocated across that network is not detailed — only the high-level problem framing.
Feature-vector size (bytes) not disclosed. Online store latency per SKU is quoted as 10–20 ms — reasonable for a low-throughput KV store — but without feature-vector size or number of SKUs per interactive request, throughput calculations are not possible.
12-week forecast wall-clock not decomposed. The post's "< 2 hours" figure is the full pipeline; no breakdown between pre-processing, transformation, training, and post-processing is given.
Conformal inference is mentioned but not explained. "Nixtla's MLForecast with conformal inference, and LightGBM for probabilistic forecasts" — the interaction between LightGBM point forecasts and conformal-based prediction intervals is not elaborated.
No mention of cost. Unlike the 2022 ML Platform post which calls out "low total cost of ownership," this post does not quote AWS bill figures for the 5M-SKU weekly run or the daily batch run.
Backfill / retraining strategy. Retraining cadence is weekly (full retrain) per the forecast pipeline; there's no mention of online-learning or incremental retraining strategies. Presumably full retrain every week is acceptable because the pipeline is under 2 hours.
Failure mode of the online path. If SQS backs up or Lambda concurrency hits a ceiling, the post doesn't describe fallback UX — does the partner portal fall back to the cached daily batch answer, or does it surface an error? Not addressed.

Wiki positioning¶

Eighteenth Zalando ingest and opens axis 18: probabilistic demand forecasting + black-box inventory optimisation on zFlow — downstream sibling of axis 10 ( Payments real-time ML) and axis 11 (sources/2022-04-18-zalando-zalandos-machine-learning-platform ML Platform / zFlow disclosure). Axis 10 was zFlow's first publicly-named workload; axis 18 is the third publicly-named zFlow workload with a hybrid online/offline delivery model rather than the 2021 online-endpoint-only or 2022 batch-oriented framings.
First wiki instance of SageMaker Feature Store used in both online and offline modes by the same pipeline. First wiki instance of two-stage forecast + optimise as an architectural vocabulary entry. First wiki instance of concepts/data-preprocessing-vs-data-transformation-split as load-bearing vocabulary — the post's own table makes this a canonical definition.
First wiki ingest featuring Nixtla MLForecast + Numba + LightGBM as a time-series-forecasting production stack (prior LightGBM uses on the wiki have been for ranking/classification, not time-series).