Skip to content

PATTERN Cited by 1 source

Probabilistic forecast + percentile objective

Problem

Decision-under-uncertainty systems (inventory, capacity, pricing, bidding) face two orthogonal axes:

  • Forecast axis — what uncertainty model does the system consume? Point estimate vs full distribution.
  • Objective axis — how does the optimiser aggregate the cost distribution into a scalar? Mean vs percentile (P75 / P90 / CVaR).

Teams often treat these independently: forecasting team picks a model, optimisation team picks an objective. But the two choices interact multiplicatively, not independently — and picking only one leaves the other's value unrealised.

Pattern

Compose both levers together as a single architectural primitive:

  1. Probabilistic forecast — produce a full distribution over the uncertain variable (demand / price / lead time) per entity × period. Options:
  2. Quantile regression (LightGBM quantile — Zalando's choice).
  3. Native probabilistic models (DeepAR, TFT, NeuralProphet).
  4. Conformal-inference wrapper around a point forecast.
  5. Monte Carlo evaluation — sample from the forecast distribution; forward-simulate via a DES or equivalent per candidate decision θ.
  6. Percentile objective — minimise the 75th percentile (or higher) of the cost distribution rather than the mean. Captures tail risk explicitly.
  7. Gradient-free optimiser — because percentile-of-Monte-Carlo is non-differentiable and noisy, search θ-space with a black-box optimiser.

The four components form a dependency chain: percentile objective needs Monte Carlo samples, which need probabilistic forecasts. Any break in the chain collapses the pattern back to classical expected-value optimisation.

Why the composition works (empirical)

From Zalando's ablation in the 2026-01-14 paper announcement:

Configuration GMV Uplift
Probabilistic + Percentile (P75) 22.11%
Probabilistic + Mean 19.02%
Point + Percentile 6.37%

Deltas:

  • Probabilistic contribution (Row 1 vs Row 3): +15.74pp GMV — first-order lever.
  • Percentile contribution (Row 1 vs Row 2): +3.09pp GMV — second-order lever (stability / tail protection).

Verbatim: "You need both. Switching from point forecasts to probabilistic ones provides the single largest gain. However, optimizing for the 75th percentile rather than the average provides that final, critical layer of stability."

See concepts/ablation-study-forecast-vs-objective for the full decomposition.

Why each lever alone is insufficient

  • Probabilistic forecast + mean objective. The mean of the Monte Carlo samples collapses tail information. You pay for the probabilistic model but aggregate away its biggest contribution. Still beats point-forecast baseline because the mean of a well-calibrated distribution is a better central estimate than a raw point forecast, but leaves substantial gain on the table.
  • Point forecast + percentile objective. The cost "distribution" is a point mass at the forecast value; P75 = mean = point forecast. Percentile objective has nothing to act on. Effectively equivalent to mean objective in this configuration.

Both-or-nothing structure: you need the probabilistic forecast to generate tail information, AND the percentile objective to use it.

Canonical instance (Zalando ZEOS)

systems/zeos-replenishment-recommender composes:

  • Probabilistic forecast via LightGBM quantile regression in Nixtla MLForecast — produces 12-week quantile forecast per (article_id, merchant_id).
  • Monte Carlo evaluation via DES — runs thousands of alternate-timeline simulations per candidate θ.
  • Percentile objective — minimises P75 of the five-pillar cost across DES samples.
  • Gradient-free optimiser (unnamed family) — searches θ = (t₀, Q₀, s, Q) per article × merchant in the Extended (R, s, Q) policy class.

Backtest over ~2M articles × ~800 merchants × 12 months shows +22% GMV, +22% margin, +34% availability, +24% fill rate vs human replenishment baselines.

Tradeoffs

  • Probabilistic forecast cost. Producing full distributions costs more (training, serving, storage) than point forecasts. Quantile regression at scale requires engineering investment; conformal-inference wrappers are cheaper but often less calibrated.
  • Monte Carlo sample budget. Percentile estimation needs enough samples for the tail to be stably estimable. Rough rule: P75 needs roughly 4× fewer samples than mean for equivalent stability; P95 needs 20× more.
  • Non-differentiable objective. Percentile of Monte Carlo output is non-smooth in θ — forces gradient-free optimisation. Scales to ~50-dimensional θ before degrading.
  • Hyperparameter: which percentile? P75 is Zalando's choice. P90 is more conservative (better tail protection, more under-utilisation). P50 (median) is essentially mean in symmetric distributions (no tail protection). Sensitivity to α is not systematically reported in the paper.
  • Forecast-quality sensitivity. If the probabilistic forecast has biased tails (over- or under-dispersed), the percentile objective amplifies the bias. Calibration monitoring is essential.

When to apply

  • Asymmetric cost function. Stockout cost >> overstock cost, or vice versa — percentile objective pays off.
  • Rare but catastrophic failures matter. Tail events (Black Friday demand spike, supplier outage) drive business outcomes — mean objective misses them.
  • Decision is irreversible in the short term. Inventory ordered this week is on shelf for weeks; you can't re- optimise intra-week if realised demand differs from mean.

When NOT to apply

  • Symmetric, steady demand. If cost is symmetric and demand is near-Gaussian, P75 ≈ mean ≈ median. No tail to protect against. Stick with mean objective, save the engineering investment.
  • High-frequency, reversible decisions. If you can re- decide every minute (e.g. HFT), tail protection per decision matters less — the next decision can compensate.
  • Probabilistic forecast not feasible. If the uncertainty model can't be calibrated reliably, percentile objective amplifies the calibration bias. Better to invest in point-forecast-plus-safety-stock heuristics than half-baked probabilistic pipelines.

Seen in

Last updated · 428 distilled / 1,221 read