SYSTEM Cited by 1 source
ZEOS Demand Forecaster¶
ZEOS Demand Forecaster is the weekly batch
probabilistic-forecast pipeline inside the
ZEOS Inventory
Optimisation System. It produces a 12-week-ahead
probabilistic forecast per (article_id, merchant_id, week)
for 5 million SKUs (size + colour granularity) by training
on 3 years of sliding-window history — full pipeline
end-to-end under 2 hours.
Pipeline shape¶
Three stages (right-to-left per the post's Figure 2):
1. Feature Engineering¶
Split into two complementary tiers per the data pre-processing vs transformation split:
Data pre-processing layer — model upstream data into a human-understandable time-series representation, enabling easier validation and analysis.
- Tools: PySpark + Spark-SQL on Databricks transient job clusters writing to Delta Lake.
- Operations: joins, filters, aggregations.
- Window: 2.5-year timeframe — "enough seasonal patterns without overemphasising older historical performance."
- Scales horizontally; the volume grows linearly in SKUs × history.
Data transformation layer — engineer features that maximise predictive signals for model training.
- Tools: Pandas, scikit-learn, NumPy, Numba inside a SageMaker Processing Job.
- Operations: encoding, normalisation, etc.
- Scales vertically because scikit-learn / NumPy / Numba lack native distribution support.
Key transformations:
- Deriving historical demand from sales + stock / availability data.
- Pricing information: initial and discounted prices at weekly granularity.
- Article metadata (category, colour, material, …).
- Unique identifier per time-series:
(article_id, merchant_id)tuple — see concepts/skus-as-time-series-unit.
Forecasting-specific features (target lags / transformations, exogenous feature lags / transformations, temporal features) are not implemented in-house — handed off to Nixtla's MLForecast, which uses Numba under the hood.
2. Model Training + Predictions¶
- Model: LightGBM via Nixtla's MLForecast.
- Compute: single SageMaker Training Job that owns both training and inference — no checkpointing, no separate inference infrastructure (see patterns/single-sagemaker-training-job-train-and-infer).
- Output: 12-week probabilistic demand forecast for each
(article_id, merchant_id, week)combination.
Rationale from the post:
"After extensive experimentation with deep learning models like TFT and other machine learning approaches, we selected the LightGBM model integrated with Nixtla's MLForecast interface as the foundation of our demand forecasting pipeline."
And on the train+infer collapse:
"Due to the ML model's lightweight training footprint, we bypass complexity, like for example not needing checkpointing, or separate infrastructure for inference. Instead, model training as well as model inference are executed in a single pipeline using AWS SageMaker Training Jobs."
3. Post Processing¶
- Compute: SageMaker Processing Job.
- Jobs:
- Reshape predictions into a time-series representation consumable by the downstream replenishment recommender.
- Statistical model-performance analysis.
- Key business-metric computation.
- Output sinks to drift monitoring: metrics flow into AWS CloudWatch alarms + Lambda functions that push alerts to relevant operator channels.
Scale (verbatim)¶
"Our weekly forecasting pipeline processes 3 years of historical data for 5 million SKUs (size and colour) using a sliding window approach, and takes less than 2 hours. This high performance pipeline is enabled by a deliberate focus on data model design and I/O efficiency. We maintain a low total cost of ownership while ensuring reliability and scalability guarantees by leveraging zFlow and AWS-native services in our pipeline."
| Quantity | Value |
|---|---|
| SKUs | 5,000,000 (at size + colour granularity) |
| Historical input | 3 years, sliding window (2.5-year effective for seasonal capture) |
| Forecast horizon | 12 weeks ahead |
| Forecast cadence | Weekly |
| End-to-end wall-clock | < 2 hours |
| Forecast output per unit | Probabilistic distribution (not point estimate) |
| Keyed by | (article_id, merchant_id, week) |
Platform substrate¶
Runs on zFlow; zFlow compiles the pipeline to an AWS Step Functions state machine via AWS CDK-generated CloudFormation. Databricks clusters + SageMaker jobs are launched per-run as dedicated resources so a failure of one run doesn't impact parallel executions — see patterns/transient-databricks-cluster-per-run.
Canonical disclosure¶
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — full architectural disclosure: three-stage pipeline, feature-engineering two-tier split, LightGBM+MLForecast choice, train-infer-in-one-job compression, CloudWatch+Lambda drift monitoring, sub-2-hour wall-clock over 5M SKUs × 3-year window.
Seen in¶
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — canonical first disclosure. First wiki instance of a Zalando pipeline using Nixtla + LightGBM for probabilistic time-series forecasting at 5M-SKU scale.
Related¶
- systems/zeos-inventory-optimisation-system — parent.
- systems/zeos-replenishment-recommender — consumes the 12-week probabilistic output.
- systems/zflow — orchestration substrate.
- systems/databricks · systems/delta-lake · systems/apache-spark — pre-processing tier.
- systems/aws-sagemaker-ai · systems/sagemaker-processing-job · systems/sagemaker-training-job — compute tier.
- systems/mlforecast-nixtla · systems/lightgbm · systems/numba — modelling stack.
- systems/aws-cloudwatch · systems/aws-lambda — drift monitoring.
- concepts/probabilistic-demand-forecast · concepts/sliding-window-training · concepts/data-preprocessing-vs-data-transformation-split · concepts/horizontal-vs-vertical-scalability-for-feature-engineering · concepts/single-pipeline-train-and-infer · concepts/transient-job-cluster · concepts/skus-as-time-series-unit · concepts/model-drift-monitoring
- patterns/pyspark-preprocessing-to-python-transformation-split · patterns/single-sagemaker-training-job-train-and-infer · patterns/transient-databricks-cluster-per-run
- companies/zalando