PATTERN Cited by 1 source
Bounded-load consistent hashing¶
Definition¶
Bounded-load consistent hashing is a consistent-hash variant that caps the traffic any single backend can absorb at a configurable multiple of the cluster average — typically 2×. When a hot key's owner pod hits the cap, subsequent requests spill clockwise to the next non-overloaded pod on the ring. The cache locality benefit of plain consistent hashing is preserved for 99%+ of keys, while the tail of extremely-hot keys gets redistributed instead of toppling their owner.
The underlying algorithm is from
Mirrokni / Thorup / Zadimoghaddam (2018),
"Consistent Hashing with Bounded Loads." Google Cloud Load
Balancer shipped a production version; Envoy's ring_hash
and maglev support bounded-load via hash_balance_factor.
Why plain consistent hashing isn't enough¶
Consistent hashing gives you locality — the same key always routes to the same owner — but at the cost of imbalance: if one key is very hot, its owner absorbs all of it, even when peer pods are underutilised. At e-commerce scale this breaks on limited-edition drops:
"Limited-edition releases, such as Nike sneakers, generate sudden massive traffic spikes." (Source: sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api.)
Without a cap, the owner pod for the hot key saturates CPU or connection count, drops everything including requests for the 999,999 other keys it owns, and the blast radius exceeds the single key.
How the cap works¶
- Each pod has a measured current load (active requests, queue depth, or some other signal).
- The cluster-wide average is tracked.
- When a request arrives and its consistent-hash owner is
at or above
ceiling = balance_factor × average, the request goes to the next pod clockwise on the ring that is below the ceiling. - As load drains, the owner takes traffic again naturally.
Zalando's default is 2× average — tight enough to prevent overload, loose enough to preserve most cache locality.
Trade-offs¶
- Cache-miss rate on spilled requests — the spill-over pod does not own the key in the cache; it'll miss on first touch. At 2× cap this is rare but non-zero; the balance factor tunes the miss-vs-overload trade.
- Ownership clarity weakens — a request's routed pod is no longer a pure function of the key; clients that depend on strict stickiness may need a different mechanism.
- Measurement lag matters — the "current load" signal has to update faster than the spike. Stale load signals can cause oscillation (pod A overloaded → all traffic to B → B overloaded → back to A).
Implementation footprint¶
- Google Maglev / GCLB — bounded-load is the default.
- Envoy —
ring_hashandmaglevload balancers accept ahash_balance_factorinteger; values in[100, ∞)with 200 = 2× cap. - HAProxy — not natively, typically implemented with external policy.
- Zalando Skipper — added as a first-class feature in skipper#1769 (Zalando's upstream contribution from PRAPI's deployment). Pairs with the fixed-100-position placement fix in skipper#1712 to get both stable cache locality and hot-key safety from the same ingress.
Seen in¶
- sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api — Zalando's canonical production instance, paired with their open-source Skipper implementation. The post is clear that the bounded-load addition is what makes CHLB survive Cyber Week and limited-edition drops without degrading.
Related¶
- concepts/consistent-hashing — the base algorithm this variant extends
- patterns/power-of-two-choices — the alternative used by PRAPI's batch component (P2C when stickiness is not required)
- systems/skipper-proxy · systems/zalando-prapi
- concepts/cache-hit-rate — the property preserved (mostly) under bounded-load