Skip to content

PATTERN Cited by 1 source

Occupancy-based load signal

Definition

Occupancy-based load signal replaces traditional in-flight request count or throughput (requests/second) with seconds of work per second as the metric for load-balancing decisions. Calculated as total_occupied_time / window_duration over a sliding window, it reflects actual pod busyness grounded in time rather than snapshot counts.

Why It Beats Alternatives

Signal Reported load (1000 rps, 1ms each) Problem
In-flight 0 (sampled between bursts) Instantaneous, local, drops to zero between bursts
Throughput 1,000 Overstates load for fast responses; a pod at 1000 rps with 1ms latency is barely loaded
Occupancy ~1.0 1000 × 1ms = 1.0s of work per second — true busyness

Occupancy is grounded in Little's Law (L = λW): accumulate request duration in a sliding window, divide by window length. It holds steady between bursts and rises with real work.

Implementation Details (Zalando PRAPI)

  • Window: 150ms, split into five 30ms sliding buckets
  • Composite signal: max(inflight, occupancy) — catches slow in-flight requests not yet completed AND bursty fast requests
  • Latency weighting: effectiveLoad = max(inflight, occupancy) × min(podLatency / globalLatency, 5) — slow pods weigh more; stuck pods (no responses) get full 5× cap
  • Walk cap: 10 hops maximum; if no pod under threshold in 10, route to least-loaded seen
  • Result: Balance factor loosened from 1.10 to 1.25; HPA threshold raised from 50% → 65% CPU; 25% fewer pods

Seen In

Last updated · 559 distilled / 1,651 read