PATTERN Cited by 1 source
DES + gradient-free optimiser under uncertainty¶
Problem¶
Many real-world decision problems have the following combination of properties:
- Stochastic inputs — demand, lead times, arrival rates, failure times — each with a distribution rather than a point value.
- Branching state dynamics — stockouts truncate fulfilment; order decisions depend on post-fulfilment inventory level; returns arrive later conditional on sales.
- No closed-form objective. The cost of a decision θ can only
be computed by simulating the state evolution forward — no
analytic
E[cost(θ)]exists. - Non-trivial decision space. θ may be multi-dimensional
(Zalando:
(t₀, Q₀, s, Q)— 4 continuous variables per article × merchant pair). - Non-differentiable output. Simulation output includes branching logic (if inventory < 0 then lost_sales) — output is not smooth in θ.
Classical optimisation techniques (gradient descent, closed-form Newsvendor) fail on combinations of these properties. You need a different architectural pairing.
Pattern¶
Compose two specialised components:
- Discrete Event Simulation for forward-evaluation of a single decision θ.
- Advance time in discrete ticks.
- At each tick, sample stochastic inputs from their distributions.
- Execute deterministic state transitions in a specified order.
- Accumulate summary metrics (cost, service level) across ticks.
- Gradient-free black-box optimiser for searching θ-space.
- Call DES N times per candidate θ (Monte Carlo realisations).
- Aggregate the N DES outputs into a scalar (mean / percentile / CVaR).
- Adapt search based on scalar feedback — no gradients required.
The two components form a sandwich: DES is the expensive per-sample evaluator; the gradient-free optimiser sits on top, calling DES many times and guiding the search. Monte Carlo aggregation fits between them, averaging across DES realisations for a single θ before handing the scalar to the optimiser.
Canonical pipeline¶
candidate θ
│
▼
┌───────────────────────────────────────────────┐
│ Monte Carlo loop (N samples) │
│ for i in 1..N: │
│ sample demand[i], lead_times[i], ... │
│ cost[i] = DES(θ, samples[i]) │
│ return aggregate(cost) # mean, P75, CVaR │
└───────────────────────────────────────────────┘
│
▼
scalar score
│
▼
gradient-free optimiser
│ (CMA-ES / Bayesian optimisation / Nelder-Mead / …)
▼
next θ to evaluate
The optimiser's budget is K × N DES evaluations where K is
the number of candidate θ evaluated; amortised compute cost is
K × N × DES_run_cost.
Why DES specifically¶
Alternatives to DES as the per-sample evaluator, and why each falls short:
- Closed-form / analytical — breaks on branching logic.
- Continuous-time simulation (ODE integration) — wrong granularity for discrete events (orders placed, inventory checks).
- Agent-based simulation — overkill if the state model is well-described by aggregate inventory levels rather than per-customer agents.
- Markov Decision Process (MDP) value iteration — requires specifying a Markov transition kernel explicitly; often too restrictive for fashion-commerce state (calendar effects, non-stationarity).
DES hits the sweet spot: discrete ticks, ordered events, explicit state transitions — matches the natural structure of inventory, queueing, scheduling, and capacity problems.
Why gradient-free specifically¶
Alternatives to gradient-free, and why each falls short:
- SGD / Adam — needs differentiable output; DES output has branching non-differentiability.
- Reparameterisation tricks — can make certain Monte Carlo outputs differentiable, but adds complexity and requires re-deriving gradients per cost function change.
- Policy gradient (REINFORCE) — high variance; works for small θ but scales poorly.
- Analytical approximations — require linearising the simulator; loses the mechanism-level fidelity that motivates DES in the first place.
Gradient-free optimisers accept the simulator as an opaque oracle, which is exactly what DES produces. The cost: gradient-free methods scale to ~50 decision variables before degrading — sufficient for Zalando's 4-dimensional θ per (article, merchant) but would be a concern for high-dimensional joint optimisation problems.
Canonical instance (Zalando ZEOS)¶
systems/zeos-replenishment-recommender uses this pattern:
- DES — 12-week horizon per run; weekly ticks; intra-week ordered events (inbound/2 → fulfilment → inbound/2 → reorder check → cost accrual).
- Gradient-free optimiser — unnamed family; searches
θ = (t₀, Q₀, s, Q)per article × merchant. - Aggregation — P75 percentile across Monte Carlo samples (not mean).
- Composition with upstream probabilistic forecast — the DES samples from a probabilistic forecast of 12-week demand; patterns/probabilistic-forecast-plus-percentile-objective is the companion pattern that connects the forecaster to the objective function.
Tradeoffs¶
- Compute cost.
K × N × DES_costper optimisation — per article × merchant. Zalando amortises via daily batch (SageMaker Batch Transform) for the common case + online Lambda for partner-initiated what-ifs. - Parallelism. Both axes are embarrassingly parallel — N Monte Carlo samples can run in parallel within an iteration; K candidate θ's can often be evaluated in parallel in the optimiser. Zalando's Lambda path exploits this.
- No exploitation of problem structure. Gradient-free optimisers treat DES as a black box; any structure that would have helped (convexity regions, submodularity, monotonicity) is ignored. Specialised inventory-theory results can outperform when they apply.
- Reproducibility. DES + gradient-free is stochastic on two axes (Monte Carlo sample draws, optimiser random seed). Re-running gives slightly different θ*. Production use requires seed-pinning or multi-run averaging.
- Hyperparameters. Every component has knobs: DES tick size + event order, Monte Carlo N, percentile α, optimiser family + iteration budget + restart strategy. Each knob changes output; tuning them per-problem is ongoing.
When to apply¶
- Operations-research problems with probabilistic inputs. Inventory, queueing, scheduling, capacity planning — all natural fits.
- Risk-sensitive decisions. When tail cost matters (combine with percentile objective).
- Moderate decision dimensionality. Up to ~50 parameters per optimisation problem.
- Enough compute for Monte Carlo. Lambda-timescale (seconds-to-minutes) or batch-timescale (daily) evaluation.
When NOT to apply¶
- High-dimensional θ (>100 parameters). Gradient-free optimisers degrade; use policy-gradient RL or reparameterised stochastic gradient methods.
- Problem has closed-form solution. Newsvendor has a closed-form critical-fractile answer — no DES needed. Use it.
- Real-time (sub-second) decisions. DES + gradient-free optimiser is too slow for latency-sensitive paths. Pre- compute θ* offline, serve from a cache.
- Partial-observation / learning-while-acting required. If you need to explore + exploit online, use reinforcement-learning frameworks instead.
Seen in¶
- sources/2026-01-14-zalando-paper-announcement-replenishment-optimization-extended-rsq — canonical first disclosure of DES + gradient-free optimiser pairing at wiki scale. Nature Scientific Reports paper announcement names both components explicitly and the architectural separation between DES (per-sample evaluator) and the black-box optimiser (θ search layer) is load-bearing in the paper's framing.
- sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — upstream architecture disclosure; names Monte Carlo + black-box optimiser but does not disclose DES as the specific simulator shape (that disclosure comes in the 2026-01-14 paper announcement).
Related¶
- concepts/discrete-event-simulation — the per-sample evaluator.
- concepts/monte-carlo-simulation-under-uncertainty — the aggregation layer between DES and the optimiser.
- concepts/gradient-free-black-box-optimisation — the θ-search layer.
- concepts/percentile-objective-optimisation — the typical aggregation function across Monte Carlo samples.
- concepts/extended-r-s-q-policy — the policy family whose θ is searched by this pattern in Zalando's instance.
- patterns/probabilistic-forecast-plus-percentile-objective — companion pattern on the upstream-forecast + objective side.
- patterns/two-stage-forecast-plus-optimisation-pipeline — higher-altitude parent pattern (cadence decoupling).
- systems/zeos-replenishment-recommender
- companies/zalando