SYSTEM Cited by 2 sources

ZEOS Demand Forecaster¶

ZEOS Demand Forecaster is the weekly batch probabilistic-forecast pipeline inside the ZEOS Inventory Optimisation System. It produces a 12-week-ahead probabilistic forecast per (article_id, merchant_id, week) for 5 million SKUs (size + colour granularity) by training on 3 years of sliding-window history — full pipeline end-to-end under 2 hours.

Pipeline shape¶

Three stages (right-to-left per the post's Figure 2):

1. Feature Engineering¶

Split into two complementary tiers per the data pre-processing vs transformation split:

Data pre-processing layer — model upstream data into a human-understandable time-series representation, enabling easier validation and analysis.

Tools: PySpark + Spark-SQL on Databricks transient job clusters writing to Delta Lake.
Operations: joins, filters, aggregations.
Window: 2.5-year timeframe — "enough seasonal patterns without overemphasising older historical performance."
Scales horizontally; the volume grows linearly in SKUs × history.

Data transformation layer — engineer features that maximise predictive signals for model training.

Tools: Pandas, scikit-learn, NumPy, Numba inside a SageMaker Processing Job.
Operations: encoding, normalisation, etc.
Scales vertically because scikit-learn / NumPy / Numba lack native distribution support.

Key transformations:

Deriving historical demand from sales + stock / availability data.
Pricing information: initial and discounted prices at weekly granularity.
Article metadata (category, colour, material, …).
Unique identifier per time-series: (article_id, merchant_id) tuple — see concepts/skus-as-time-series-unit.

Forecasting-specific features (target lags / transformations, exogenous feature lags / transformations, temporal features) are not implemented in-house — handed off to Nixtla's MLForecast, which uses Numba under the hood.

2. Model Training + Predictions¶

Model: LightGBM via Nixtla's MLForecast.
Compute: single SageMaker Training Job that owns both training and inference — no checkpointing, no separate inference infrastructure (see patterns/single-sagemaker-training-job-train-and-infer).
Output: 12-week probabilistic demand forecast for each (article_id, merchant_id, week) combination.

Rationale from the post:

"After extensive experimentation with deep learning models like TFT and other machine learning approaches, we selected the LightGBM model integrated with Nixtla's MLForecast interface as the foundation of our demand forecasting pipeline."

And on the train+infer collapse:

"Due to the ML model's lightweight training footprint, we bypass complexity, like for example not needing checkpointing, or separate infrastructure for inference. Instead, model training as well as model inference are executed in a single pipeline using AWS SageMaker Training Jobs."

3. Post Processing¶

Compute: SageMaker Processing Job.
Jobs:
Reshape predictions into a time-series representation consumable by the downstream replenishment recommender.
Statistical model-performance analysis.
Key business-metric computation.
Output sinks to drift monitoring: metrics flow into AWS CloudWatch alarms + Lambda functions that push alerts to relevant operator channels.

Scale (verbatim)¶

"Our weekly forecasting pipeline processes 3 years of historical data for 5 million SKUs (size and colour) using a sliding window approach, and takes less than 2 hours. This high performance pipeline is enabled by a deliberate focus on data model design and I/O efficiency. We maintain a low total cost of ownership while ensuring reliability and scalability guarantees by leveraging zFlow and AWS-native services in our pipeline."

Quantity	Value
SKUs	5,000,000 (at size + colour granularity)
Historical input	3 years, sliding window (2.5-year effective for seasonal capture)
Forecast horizon	12 weeks ahead
Forecast cadence	Weekly
End-to-end wall-clock	< 2 hours
Forecast output per unit	Probabilistic distribution (not point estimate)
Keyed by	`(article_id, merchant_id, week)`

Platform substrate¶

Runs on zFlow; zFlow compiles the pipeline to an AWS Step Functions state machine via AWS CDK-generated CloudFormation. Databricks clusters + SageMaker jobs are launched per-run as dedicated resources so a failure of one run doesn't impact parallel executions — see patterns/transient-databricks-cluster-per-run.

Canonical disclosure¶

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — full architectural disclosure: three-stage pipeline, feature-engineering two-tier split, LightGBM+MLForecast choice, train-infer-in-one-job compression, CloudWatch+Lambda drift monitoring, sub-2-hour wall-clock over 5M SKUs × 3-year window.
sources/2026-01-14-zalando-paper-announcement-replenishment-optimization-extended-rsq — algorithm-shape companion. Discloses that the forecaster produces quantile forecasts directly (LightGBM quantile regression, not post-hoc conformal-inference wrapping). The ablation study in this paper quantifies the forecaster's contribution: probabilistic forecasts contribute +15.74pp GMV uplift over point-forecast baselines in Zalando's replenishment backtest — first-order lever for the overall 22% end-to-end result.

Seen in¶

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — canonical first disclosure. First wiki instance of a Zalando pipeline using Nixtla + LightGBM for probabilistic time-series forecasting at 5M-SKU scale.
sources/2026-01-14-zalando-paper-announcement-replenishment-optimization-extended-rsq — canonicalises the forecast mechanism as LightGBM quantile regression (verbatim: "By using quantile forecasts from our LightGBM-based demand service, the system accounts for tail risks — those rare but financially significant demand spikes that a simple average would miss."). The quantile-regression framing makes tail-coverage a first-class forecast property, not a post-processing calibration step. Paper's ablation study isolates the forecaster's load-bearing contribution: probabilistic forecast is the first-order lever in the two-lever probabilistic-forecast-plus-percentile-objective pattern.

systems/zeos-inventory-optimisation-system — parent.
systems/zeos-replenishment-recommender — consumes the 12-week probabilistic output.
systems/zflow — orchestration substrate.
systems/databricks · systems/delta-lake · systems/apache-spark — pre-processing tier.
systems/aws-sagemaker-ai · systems/sagemaker-processing-job · systems/sagemaker-training-job — compute tier.
systems/mlforecast-nixtla · systems/lightgbm · systems/numba — modelling stack.
systems/aws-cloudwatch · systems/aws-lambda — drift monitoring.
concepts/probabilistic-demand-forecast · concepts/sliding-window-training · concepts/data-preprocessing-vs-data-transformation-split · concepts/horizontal-vs-vertical-scalability-for-feature-engineering · concepts/single-pipeline-train-and-infer · concepts/transient-job-cluster · concepts/skus-as-time-series-unit · concepts/model-drift-monitoring · concepts/extended-r-s-q-policy · concepts/percentile-objective-optimisation
patterns/pyspark-preprocessing-to-python-transformation-split · patterns/single-sagemaker-training-job-train-and-infer · patterns/transient-databricks-cluster-per-run · patterns/probabilistic-forecast-plus-percentile-objective
companies/zalando