PATTERN Cited by 1 source
Occupancy-based load signal¶
Definition¶
Occupancy-based load signal replaces traditional in-flight request count or throughput (requests/second) with seconds of work per second as the metric for load-balancing decisions. Calculated as total_occupied_time / window_duration over a sliding window, it reflects actual pod busyness grounded in time rather than snapshot counts.
Why It Beats Alternatives¶
| Signal | Reported load (1000 rps, 1ms each) | Problem |
|---|---|---|
| In-flight | 0 (sampled between bursts) | Instantaneous, local, drops to zero between bursts |
| Throughput | 1,000 | Overstates load for fast responses; a pod at 1000 rps with 1ms latency is barely loaded |
| Occupancy | ~1.0 | 1000 × 1ms = 1.0s of work per second — true busyness |
Occupancy is grounded in Little's Law (L = λW): accumulate request duration in a sliding window, divide by window length. It holds steady between bursts and rises with real work.
Implementation Details (Zalando PRAPI)¶
- Window: 150ms, split into five 30ms sliding buckets
- Composite signal:
max(inflight, occupancy)— catches slow in-flight requests not yet completed AND bursty fast requests - Latency weighting:
effectiveLoad = max(inflight, occupancy) × min(podLatency / globalLatency, 5)— slow pods weigh more; stuck pods (no responses) get full 5× cap - Walk cap: 10 hops maximum; if no pod under threshold in 10, route to least-loaded seen
- Result: Balance factor loosened from 1.10 to 1.25; HPA threshold raised from 50% → 65% CPU; 25% fewer pods
Seen In¶
- systems/zalando-prapi — replaced in-flight count for bounded-load decisions in client-side load balancer (Source: sources/2026-06-22-zalando-client-side-load-balancing)
- Twitter Finagle — latency-weighted load balancing concept (referenced in article)