Skip to content

PATTERN Cited by 1 source

DES + gradient-free optimiser under uncertainty

Problem

Many real-world decision problems have the following combination of properties:

  • Stochastic inputs — demand, lead times, arrival rates, failure times — each with a distribution rather than a point value.
  • Branching state dynamics — stockouts truncate fulfilment; order decisions depend on post-fulfilment inventory level; returns arrive later conditional on sales.
  • No closed-form objective. The cost of a decision θ can only be computed by simulating the state evolution forward — no analytic E[cost(θ)] exists.
  • Non-trivial decision space. θ may be multi-dimensional (Zalando: (t₀, Q₀, s, Q) — 4 continuous variables per article × merchant pair).
  • Non-differentiable output. Simulation output includes branching logic (if inventory < 0 then lost_sales) — output is not smooth in θ.

Classical optimisation techniques (gradient descent, closed-form Newsvendor) fail on combinations of these properties. You need a different architectural pairing.

Pattern

Compose two specialised components:

  1. Discrete Event Simulation for forward-evaluation of a single decision θ.
  2. Advance time in discrete ticks.
  3. At each tick, sample stochastic inputs from their distributions.
  4. Execute deterministic state transitions in a specified order.
  5. Accumulate summary metrics (cost, service level) across ticks.
  6. Gradient-free black-box optimiser for searching θ-space.
  7. Call DES N times per candidate θ (Monte Carlo realisations).
  8. Aggregate the N DES outputs into a scalar (mean / percentile / CVaR).
  9. Adapt search based on scalar feedback — no gradients required.

The two components form a sandwich: DES is the expensive per-sample evaluator; the gradient-free optimiser sits on top, calling DES many times and guiding the search. Monte Carlo aggregation fits between them, averaging across DES realisations for a single θ before handing the scalar to the optimiser.

Canonical pipeline

candidate θ
┌───────────────────────────────────────────────┐
│ Monte Carlo loop (N samples)                  │
│   for i in 1..N:                              │
│     sample demand[i], lead_times[i], ...      │
│     cost[i] = DES(θ, samples[i])              │
│   return aggregate(cost)  # mean, P75, CVaR   │
└───────────────────────────────────────────────┘
 scalar score
 gradient-free optimiser
   │  (CMA-ES / Bayesian optimisation / Nelder-Mead / …)
 next θ to evaluate

The optimiser's budget is K × N DES evaluations where K is the number of candidate θ evaluated; amortised compute cost is K × N × DES_run_cost.

Why DES specifically

Alternatives to DES as the per-sample evaluator, and why each falls short:

  • Closed-form / analytical — breaks on branching logic.
  • Continuous-time simulation (ODE integration) — wrong granularity for discrete events (orders placed, inventory checks).
  • Agent-based simulation — overkill if the state model is well-described by aggregate inventory levels rather than per-customer agents.
  • Markov Decision Process (MDP) value iteration — requires specifying a Markov transition kernel explicitly; often too restrictive for fashion-commerce state (calendar effects, non-stationarity).

DES hits the sweet spot: discrete ticks, ordered events, explicit state transitions — matches the natural structure of inventory, queueing, scheduling, and capacity problems.

Why gradient-free specifically

Alternatives to gradient-free, and why each falls short:

  • SGD / Adam — needs differentiable output; DES output has branching non-differentiability.
  • Reparameterisation tricks — can make certain Monte Carlo outputs differentiable, but adds complexity and requires re-deriving gradients per cost function change.
  • Policy gradient (REINFORCE) — high variance; works for small θ but scales poorly.
  • Analytical approximations — require linearising the simulator; loses the mechanism-level fidelity that motivates DES in the first place.

Gradient-free optimisers accept the simulator as an opaque oracle, which is exactly what DES produces. The cost: gradient-free methods scale to ~50 decision variables before degrading — sufficient for Zalando's 4-dimensional θ per (article, merchant) but would be a concern for high-dimensional joint optimisation problems.

Canonical instance (Zalando ZEOS)

systems/zeos-replenishment-recommender uses this pattern:

  • DES — 12-week horizon per run; weekly ticks; intra-week ordered events (inbound/2 → fulfilment → inbound/2 → reorder check → cost accrual).
  • Gradient-free optimiser — unnamed family; searches θ = (t₀, Q₀, s, Q) per article × merchant.
  • AggregationP75 percentile across Monte Carlo samples (not mean).
  • Composition with upstream probabilistic forecast — the DES samples from a probabilistic forecast of 12-week demand; patterns/probabilistic-forecast-plus-percentile-objective is the companion pattern that connects the forecaster to the objective function.

Tradeoffs

  • Compute cost. K × N × DES_cost per optimisation — per article × merchant. Zalando amortises via daily batch (SageMaker Batch Transform) for the common case + online Lambda for partner-initiated what-ifs.
  • Parallelism. Both axes are embarrassingly parallel — N Monte Carlo samples can run in parallel within an iteration; K candidate θ's can often be evaluated in parallel in the optimiser. Zalando's Lambda path exploits this.
  • No exploitation of problem structure. Gradient-free optimisers treat DES as a black box; any structure that would have helped (convexity regions, submodularity, monotonicity) is ignored. Specialised inventory-theory results can outperform when they apply.
  • Reproducibility. DES + gradient-free is stochastic on two axes (Monte Carlo sample draws, optimiser random seed). Re-running gives slightly different θ*. Production use requires seed-pinning or multi-run averaging.
  • Hyperparameters. Every component has knobs: DES tick size + event order, Monte Carlo N, percentile α, optimiser family + iteration budget + restart strategy. Each knob changes output; tuning them per-problem is ongoing.

When to apply

  • Operations-research problems with probabilistic inputs. Inventory, queueing, scheduling, capacity planning — all natural fits.
  • Risk-sensitive decisions. When tail cost matters (combine with percentile objective).
  • Moderate decision dimensionality. Up to ~50 parameters per optimisation problem.
  • Enough compute for Monte Carlo. Lambda-timescale (seconds-to-minutes) or batch-timescale (daily) evaluation.

When NOT to apply

  • High-dimensional θ (>100 parameters). Gradient-free optimisers degrade; use policy-gradient RL or reparameterised stochastic gradient methods.
  • Problem has closed-form solution. Newsvendor has a closed-form critical-fractile answer — no DES needed. Use it.
  • Real-time (sub-second) decisions. DES + gradient-free optimiser is too slow for latency-sensitive paths. Pre- compute θ* offline, serve from a cache.
  • Partial-observation / learning-while-acting required. If you need to explore + exploit online, use reinforcement-learning frameworks instead.

Seen in

Last updated · 428 distilled / 1,221 read