PATTERN Cited by 1 source

Occupancy-based load signal¶

Definition¶

Occupancy-based load signal replaces traditional in-flight request count or throughput (requests/second) with seconds of work per second as the metric for load-balancing decisions. Calculated as total_occupied_time / window_duration over a sliding window, it reflects actual pod busyness grounded in time rather than snapshot counts.

Why It Beats Alternatives¶

Signal	Reported load (1000 rps, 1ms each)	Problem
In-flight	0 (sampled between bursts)	Instantaneous, local, drops to zero between bursts
Throughput	1,000	Overstates load for fast responses; a pod at 1000 rps with 1ms latency is barely loaded
Occupancy	~1.0	1000 × 1ms = 1.0s of work per second — true busyness

Occupancy is grounded in Little's Law (L = λW): accumulate request duration in a sliding window, divide by window length. It holds steady between bursts and rises with real work.

Implementation Details (Zalando PRAPI)¶

Window: 150ms, split into five 30ms sliding buckets
Composite signal: max(inflight, occupancy) — catches slow in-flight requests not yet completed AND bursty fast requests
Latency weighting: effectiveLoad = max(inflight, occupancy) × min(podLatency / globalLatency, 5) — slow pods weigh more; stuck pods (no responses) get full 5× cap
Walk cap: 10 hops maximum; if no pod under threshold in 10, route to least-loaded seen
Result: Balance factor loosened from 1.10 to 1.25; HPA threshold raised from 50% → 65% CPU; 25% fewer pods

Seen In¶

systems/zalando-prapi — replaced in-flight count for bounded-load decisions in client-side load balancer (Source: sources/2026-06-22-zalando-client-side-load-balancing)
Twitter Finagle — latency-weighted load balancing concept (referenced in article)

Occupancy-based load signal¶

Definition¶

Why It Beats Alternatives¶

Implementation Details (Zalando PRAPI)¶

Seen In¶

Related¶