Zalando — Paper Announcement: A Practical Approach to Replenishment Optimization with Extended (R, s, Q) Policy and Probabilistic Models¶

Summary¶

Zalando's blog announcement (2026-01-14) of their Nature Scientific Reports publication on the algorithmic core of the ZEOS Inventory Optimisation System — the follow-up to the 2025-06-29 architecture deep-dive. Where the 2025-06 post disclosed the platform shape (zFlow orchestration, SageMaker substrate, online+offline feature store, SQS+Lambda interactive path, Monte Carlo + black-box optimiser), this announcement discloses the algorithm itself: an Extended (R, s, Q) policy with kick-start and lifecycle-cutoff extensions, Discrete Event Simulation (DES) over a 12-week horizon, 75th-percentile cost minimisation (not mean) as the risk-aware optimisation objective, quantile LightGBM demand forecasts, Gamma-distributed lead times, and a computational backtest over ~2M articles × ~800 merchants for 12 months showing +22% GMV / +22% Gross Margin / +34% availability / +24% demand-fill-rate vs human replenishment baselines. An ablation study decomposes the gain: probabilistic forecast + percentile objective is the canonical pairing; both are needed (probabilistic forecast carries the biggest single gain; percentile objective adds the final tail-protection layer).

Key takeaways¶

Extended (R, s, Q) policy is the named replenishment policy class. Classical (R, s, Q) — periodic review at interval R, reorder quantity Q when on-hand inventory drops below reorder point s — is extended with two new parameters:
Kick-start quantity Q₀ at kick-start time t₀ — aggressive initial replenishment during a product's launch phase.
Lifecycle cutoff t_limit — conservative (or zero) replenishment once the article enters decay phase.

Verbatim: "We extended the classical (R, s, Q) policy by introducing an initial kick-start quantity (Q₀) and a time-based lifecycle cutoff (t_limit). This allows the policy to be aggressive during a product's launch and conservative as it reaches its decay phase." The policy parameter vector is θ = (t₀, Q₀, s, Q).

The simulator is a Discrete Event Simulation (DES) over a 12-week horizon. Each Monte Carlo run represents one alternate timeline where demand, returns, and lead times evolve stochastically. Within each simulated week, inventory follows a precise intra-week sequence:
Intra-week processing — expected inbounds and returns are added half before and half after demand fulfilment to approximate a continuous flow of goods.
Demand realisation — sample weekly demand from the probabilistic forecast; fulfil from on-hand inventory.
Replenishment decisions — at review points, compare inventory to reorder point s; if breached, trigger a replenishment of size Q that enters transit with a sampled lead time.
Cost accumulation — track storage, inbound, outbound, return, and lost-sales costs across the full 12-week horizon.

(Source: sources/2026-01-14-zalando-paper-announcement-replenishment-optimization-extended-rsq.)

The optimisation objective is risk-aware: minimise the 75th percentile of cost, not the mean. Standard stochastic optimisation minimises E[cost]; Zalando minimises P75(cost) across Monte Carlo samples. Verbatim: "Instead of minimizing the average cost, we minimize the 75th percentile of the cost distribution. This ensures our decisions protect against extreme, rare demand spikes." This is the load-bearing choice that the ablation study isolates as responsible for the final tail-protection stability layer on top of the probabilistic-forecast gain.
Stockout demand is handled by counterfactual modelling, not rough guesses. When a stockout occurs in a simulated timeline, the demand that would have happened is modelled using the same probabilistic distributions driving the forecast — not zeroed out or truncated. Verbatim: "We handle the 'unseen' — like demand that would have happened during a stock-out — using probabilistic distributions rather than rough guesses." This is the simulator's answer to the censored-observation problem in classical inventory theory.
The full cost objective is five pillars — more granular than the prior disclosure. The 2025-06 post listed five costs (storage / lost-sales / overstock / operations / inbound); the paper splits these into a more precise five-pillar decomposition:

$$\theta^* = \arg\min_{\theta \in \Theta} \left[ C_{\text{holding}}(\theta) + C_{\text{inbound}}(\theta) + C_{\text{outbound}}(\theta) + C_{\text{returns}}(\theta) + C_{\text{lost sales}}(\theta) \right]$$

Holding / storage — fees weekly on physical stock levels.
Inbound — moving goods to the fulfilment centres.
Outbound — moving goods from the fulfilment centres.
Returns — specific processing fees for returned items.
Lost sales — margin lost when demand is unmet, adjusted by return rates to reflect "realised" lost sales (return-rate adjustment is a load-bearing detail — a lost sale that would have been returned anyway isn't really lost margin).
Lead times are sampled from Gamma distributions during simulation. Replenishment lead times are not deterministic — they are drawn from empirically-fit Gamma distributions. Returns lead times come from empirical historical distributions. This is one of two stochastic variables Monte-Carlo-sampled inside the DES (the other being demand).
Computational backtest: 1 year (Oct 2023 – Sep 2024), ~2 million articles across ~800 merchants. Benchmarked against professional human replenishment decisions. Results (verbatim table):

Metric	Engine vs Human Uplift
Gross Merchandise Value (GMV)	+22.11%
Gross Margin (GMV after FC)	+21.95%
Weighted Weekly Availability	+33.63%
Weighted Demand Fill Rate	+23.63%

Absolute levels: demand fill rate 91.14%, availability rate 86.40% — "significantly outperforming human benchmarks across the entire temporal horizon." Generalisation: ~70–80% of merchants saw positive financial uplifts. Caveat: these are theoretical 100%-adoption numbers; actual uplift depends on merchant adoption rate since the tool is an AI decision-support assistant and merchants retain final authority.

Baseline comparison: Extended (R, s, Q) materially beats all classical alternatives under identical data + simulation settings.

Policy	GMV Uplift	GMV after FC	Availability	Demand Fill Rate
Extended (R, s, Q) (ours)	22.11%	21.95%	+33.63%	+23.63%
Tuned (s, S)	13.39%	14.80%	+18.65%	+14.35%
Periodic base-stock	12.50%	13.89%	+17.99%	+14.19%
Myopic Newsvendor	5.07%	5.60%	+11.61%	+8.10%

Load-bearing finding: even the tuned (s, S) policy — a common industry standard — falls short because its static thresholds cannot match the responsiveness of the extended variables (Q₀ and t_limit) in a high-variance environment. Myopic Newsvendor underperforms because it lacks the foresight to handle lead-time and return uncertainty.

Ablation study: the secret-sauce decomposition.

Configuration	GMV Uplift	GMV after FC	Availability	Demand Fill Rate
Probabilistic Forecast + Percentile Objective (ours)	22.11%	21.95%	86.40%	91.14%
Probabilistic Forecast + Mean Objective	19.02%	20.16%	81.27%	87.98%
Point Forecast + Percentile Objective	6.37%	5.98%	77.76%	84.95%

Verbatim conclusion: "You need both. Switching from point forecasts to probabilistic ones provides the single largest gain. However, optimizing for the 75th percentile rather than the average provides that final, critical layer of stability, particularly in protecting the merchant against high-impact 'tail' events."

This is the canonical empirical decomposition for the wiki's patterns/probabilistic-forecast-plus-percentile-objective pattern: probabilistic forecast is the first-order lever, percentile objective is the second-order stability layer.

Forecaster detail: quantile LightGBM. The demand forecaster is explicitly LightGBM producing quantile forecasts — not conformal inference around a point-forecast LightGBM (which the 2025-06 post suggested via MLForecast's typical wrapping). Verbatim: "By using quantile forecasts from our LightGBM-based demand service, the system accounts for tail risks — those rare but financially significant demand spikes that a simple average would miss." The quantile-regression framing makes tail-coverage a first-class property of the forecast, not a calibration post-processing step.

Operational numbers¶

Quantity	Value
Backtest period	Oct 2023 – Sep 2024 (12 months)
Articles in backtest	~2,000,000
Merchants in backtest	~800
Simulation horizon per run	12 weeks
Optimisation objective percentile	75th (not mean)
GMV uplift vs human baseline	+22.11% (theoretical 100%-adoption)
GMV-after-FC uplift	+21.95%
Availability uplift	+33.63% (to 86.40% absolute)
Demand fill rate uplift	+23.63% (to 91.14% absolute)
Fraction of merchants with positive uplift	70–80%
Forecast model	LightGBM quantile regression
Lead time distribution	Gamma (sampled per run)

Caveats¶

Paper-announcement framing, not raw architecture disclosure. The post summarises + frames the academic paper; the paper itself in Nature Scientific Reports has the full formal methodology. This ingest canonicalises the publicly-framed algorithm + results.
Theoretical 100%-adoption uplift. The headline +22% GMV is what would happen if every merchant followed every recommendation. In practice, merchants are the final authority and adoption is partial — realised uplift depends on merchant adoption rate, which the post does not quantify. This is an important framing distinction from "the engine improves GMV by 22%" — the engine offers decisions that would improve GMV by 22% if adopted.
Sample count in Monte Carlo / optimiser iteration budget not disclosed. The DES runs "thousands of plausible futures per candidate policy" but the exact N, optimisation iteration count, and total compute budget are not quantified.
Specific gradient-free optimiser family still not disclosed. Both the 2025-06 post and this 2026-01 paper announcement avoid naming the optimiser (CMA-ES, Bayesian optimisation, Nelder-Mead, simulated annealing — all plausible from the stated properties, but unnamed).
Human baseline details not quantified. The "professional human replenishment decisions" benchmark is not characterised (team size, decision cadence, tool ecosystem, merchant-segment coverage) — the engine's 22% uplift is the composite of (better algorithm + richer forecast + faster cadence + more consistent execution) vs human baseline, not a controlled test against a specific alternative.
Extended (R, s, Q) vs (s, S) is not the full-spectrum comparison. The paper compares against Tuned (s, S), Periodic base-stock, and Myopic Newsvendor — three classical families. Newer ML-based replenishment algorithms (deep reinforcement learning approaches, etc.) are not in the comparison. The "industry-standard alternative" framing is accurate for classical inventory theory but not for the full ML replenishment landscape.
Generalisation to non-fashion commerce is not addressed. The 70–80% merchant generalisation is within Zalando's catalogue — the algorithmic choices (kick-start + lifecycle cutoff; 12-week horizon; weekly review) are tuned to fashion's launch/decay product lifecycles. Application to stable-catalogue commerce (groceries, commodities) would require re-parameterisation and possibly different policy-class extensions.

Series continuity¶

This is the second axis-20 post on the wiki — the 2025-06-29 post disclosed the platform shape of ZEOS Inventory Optimisation, and this 2026-01-14 paper announcement discloses the algorithm shape (Extended (R, s, Q) + DES + percentile objective + quantile LightGBM + 22% GMV backtest). Together they form a complete architecture + algorithm disclosure of the same production system. The earlier post establishes the two-pipeline topology (systems/zeos-demand-forecaster + systems/zeos-replenishment-recommender); this post establishes the algorithm that runs inside the optimisation pipeline.

Source¶

Original: https://engineering.zalando.com/posts/2026/01/publication-replenishment-engine.html
Nature Scientific Reports paper: https://www.nature.com/articles/s41598-025-32537-2
Prior architecture post: https://engineering.zalando.com/posts/2025/06/inventory-optimisation-system.html
Raw markdown: raw/zalando/2026-01-14-paper-announcement-a-practical-approach-to-replenishment-opt-9b60a20a.md

sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — architecture-shape prior; this post is the algorithm-shape companion.
systems/zeos-inventory-optimisation-system — parent system.
systems/zeos-replenishment-recommender — the pipeline where Extended (R, s, Q) + DES + percentile objective runs.
systems/zeos-demand-forecaster — the upstream quantile-LightGBM producer feeding Monte Carlo samples.
concepts/extended-r-s-q-policy — canonical concept page for the extended policy.
concepts/discrete-event-simulation — canonical concept page for the DES simulator.
concepts/percentile-objective-optimisation — canonical concept page for the P75 cost-minimisation objective.
concepts/counterfactual-stockout-demand-modeling — canonical concept page for modelling censored stockout demand via probabilistic distributions.
concepts/computational-backtest — canonical concept page for the replenishment-backtest methodology.
concepts/ablation-study-forecast-vs-objective — canonical concept page for the forecast-type × objective-type ablation decomposition.
concepts/probabilistic-demand-forecast — extended with quantile LightGBM canonicalisation.
concepts/monte-carlo-simulation-under-uncertainty — extended with DES-as-simulator composition.
concepts/gradient-free-black-box-optimisation — extended with this source.
patterns/probabilistic-forecast-plus-percentile-objective — the canonical composition pattern isolated by the ablation.
patterns/des-plus-gradient-free-optimiser-under-uncertainty — the architectural pairing for decision-under-uncertainty problems.
patterns/two-stage-forecast-plus-optimisation-pipeline — parent pattern.
companies/zalando