PATTERN Cited by 1 source

DES + gradient-free optimiser under uncertainty¶

Problem¶

Many real-world decision problems have the following combination of properties:

Stochastic inputs — demand, lead times, arrival rates, failure times — each with a distribution rather than a point value.
Branching state dynamics — stockouts truncate fulfilment; order decisions depend on post-fulfilment inventory level; returns arrive later conditional on sales.
No closed-form objective. The cost of a decision θ can only be computed by simulating the state evolution forward — no analytic E[cost(θ)] exists.
Non-trivial decision space. θ may be multi-dimensional (Zalando: (t₀, Q₀, s, Q) — 4 continuous variables per article × merchant pair).
Non-differentiable output. Simulation output includes branching logic (if inventory < 0 then lost_sales) — output is not smooth in θ.

Classical optimisation techniques (gradient descent, closed-form Newsvendor) fail on combinations of these properties. You need a different architectural pairing.

Pattern¶

Compose two specialised components:

Discrete Event Simulation for forward-evaluation of a single decision θ.
Advance time in discrete ticks.
At each tick, sample stochastic inputs from their distributions.
Execute deterministic state transitions in a specified order.
Accumulate summary metrics (cost, service level) across ticks.
Gradient-free black-box optimiser for searching θ-space.
Call DES N times per candidate θ (Monte Carlo realisations).
Aggregate the N DES outputs into a scalar (mean / percentile / CVaR).
Adapt search based on scalar feedback — no gradients required.

The two components form a sandwich: DES is the expensive per-sample evaluator; the gradient-free optimiser sits on top, calling DES many times and guiding the search. Monte Carlo aggregation fits between them, averaging across DES realisations for a single θ before handing the scalar to the optimiser.

Canonical pipeline¶

candidate θ
   │
   ▼
┌───────────────────────────────────────────────┐
│ Monte Carlo loop (N samples)                  │
│   for i in 1..N:                              │
│     sample demand[i], lead_times[i], ...      │
│     cost[i] = DES(θ, samples[i])              │
│   return aggregate(cost)  # mean, P75, CVaR   │
└───────────────────────────────────────────────┘
   │
   ▼
 scalar score
   │
   ▼
 gradient-free optimiser
   │  (CMA-ES / Bayesian optimisation / Nelder-Mead / …)
   ▼
 next θ to evaluate

The optimiser's budget is K × N DES evaluations where K is the number of candidate θ evaluated; amortised compute cost is K × N × DES_run_cost.

Why DES specifically¶

Alternatives to DES as the per-sample evaluator, and why each falls short:

Closed-form / analytical — breaks on branching logic.
Continuous-time simulation (ODE integration) — wrong granularity for discrete events (orders placed, inventory checks).
Agent-based simulation — overkill if the state model is well-described by aggregate inventory levels rather than per-customer agents.
Markov Decision Process (MDP) value iteration — requires specifying a Markov transition kernel explicitly; often too restrictive for fashion-commerce state (calendar effects, non-stationarity).

DES hits the sweet spot: discrete ticks, ordered events, explicit state transitions — matches the natural structure of inventory, queueing, scheduling, and capacity problems.

Why gradient-free specifically¶

Alternatives to gradient-free, and why each falls short:

SGD / Adam — needs differentiable output; DES output has branching non-differentiability.
Reparameterisation tricks — can make certain Monte Carlo outputs differentiable, but adds complexity and requires re-deriving gradients per cost function change.
Policy gradient (REINFORCE) — high variance; works for small θ but scales poorly.
Analytical approximations — require linearising the simulator; loses the mechanism-level fidelity that motivates DES in the first place.

Gradient-free optimisers accept the simulator as an opaque oracle, which is exactly what DES produces. The cost: gradient-free methods scale to ~50 decision variables before degrading — sufficient for Zalando's 4-dimensional θ per (article, merchant) but would be a concern for high-dimensional joint optimisation problems.

Canonical instance (Zalando ZEOS)¶

systems/zeos-replenishment-recommender uses this pattern:

DES — 12-week horizon per run; weekly ticks; intra-week ordered events (inbound/2 → fulfilment → inbound/2 → reorder check → cost accrual).
Gradient-free optimiser — unnamed family; searches θ = (t₀, Q₀, s, Q) per article × merchant.
Aggregation — P75 percentile across Monte Carlo samples (not mean).
Composition with upstream probabilistic forecast — the DES samples from a probabilistic forecast of 12-week demand; patterns/probabilistic-forecast-plus-percentile-objective is the companion pattern that connects the forecaster to the objective function.

Tradeoffs¶

Compute cost. K × N × DES_cost per optimisation — per article × merchant. Zalando amortises via daily batch (SageMaker Batch Transform) for the common case + online Lambda for partner-initiated what-ifs.
Parallelism. Both axes are embarrassingly parallel — N Monte Carlo samples can run in parallel within an iteration; K candidate θ's can often be evaluated in parallel in the optimiser. Zalando's Lambda path exploits this.
No exploitation of problem structure. Gradient-free optimisers treat DES as a black box; any structure that would have helped (convexity regions, submodularity, monotonicity) is ignored. Specialised inventory-theory results can outperform when they apply.
Reproducibility. DES + gradient-free is stochastic on two axes (Monte Carlo sample draws, optimiser random seed). Re-running gives slightly different θ*. Production use requires seed-pinning or multi-run averaging.
Hyperparameters. Every component has knobs: DES tick size + event order, Monte Carlo N, percentile α, optimiser family + iteration budget + restart strategy. Each knob changes output; tuning them per-problem is ongoing.

When to apply¶

Operations-research problems with probabilistic inputs. Inventory, queueing, scheduling, capacity planning — all natural fits.
Risk-sensitive decisions. When tail cost matters (combine with percentile objective).
Moderate decision dimensionality. Up to ~50 parameters per optimisation problem.
Enough compute for Monte Carlo. Lambda-timescale (seconds-to-minutes) or batch-timescale (daily) evaluation.

When NOT to apply¶

High-dimensional θ (>100 parameters). Gradient-free optimisers degrade; use policy-gradient RL or reparameterised stochastic gradient methods.
Problem has closed-form solution. Newsvendor has a closed-form critical-fractile answer — no DES needed. Use it.
Real-time (sub-second) decisions. DES + gradient-free optimiser is too slow for latency-sensitive paths. Pre- compute θ* offline, serve from a cache.
Partial-observation / learning-while-acting required. If you need to explore + exploit online, use reinforcement-learning frameworks instead.

Seen in¶

sources/2026-01-14-zalando-paper-announcement-replenishment-optimization-extended-rsq — canonical first disclosure of DES + gradient-free optimiser pairing at wiki scale. Nature Scientific Reports paper announcement names both components explicitly and the architectural separation between DES (per-sample evaluator) and the black-box optimiser (θ search layer) is load-bearing in the paper's framing.
sources/2025-06-29-zalando-building-a-dynamic-inventory-optimisation-system-a-deep-dive — upstream architecture disclosure; names Monte Carlo + black-box optimiser but does not disclose DES as the specific simulator shape (that disclosure comes in the 2026-01-14 paper announcement).

concepts/discrete-event-simulation — the per-sample evaluator.
concepts/monte-carlo-simulation-under-uncertainty — the aggregation layer between DES and the optimiser.
concepts/gradient-free-black-box-optimisation — the θ-search layer.
concepts/percentile-objective-optimisation — the typical aggregation function across Monte Carlo samples.
concepts/extended-r-s-q-policy — the policy family whose θ is searched by this pattern in Zalando's instance.
patterns/probabilistic-forecast-plus-percentile-objective — companion pattern on the upstream-forecast + objective side.
patterns/two-stage-forecast-plus-optimisation-pipeline — higher-altitude parent pattern (cadence decoupling).
systems/zeos-replenishment-recommender
companies/zalando