Skip to content

PATTERN Cited by 1 source

Bounded-load consistent hashing

Definition

Bounded-load consistent hashing is a consistent-hash variant that caps the traffic any single backend can absorb at a configurable multiple of the cluster average — typically 2×. When a hot key's owner pod hits the cap, subsequent requests spill clockwise to the next non-overloaded pod on the ring. The cache locality benefit of plain consistent hashing is preserved for 99%+ of keys, while the tail of extremely-hot keys gets redistributed instead of toppling their owner.

The underlying algorithm is from Mirrokni / Thorup / Zadimoghaddam (2018), "Consistent Hashing with Bounded Loads." Google Cloud Load Balancer shipped a production version; Envoy's ring_hash and maglev support bounded-load via hash_balance_factor.

Why plain consistent hashing isn't enough

Consistent hashing gives you locality — the same key always routes to the same owner — but at the cost of imbalance: if one key is very hot, its owner absorbs all of it, even when peer pods are underutilised. At e-commerce scale this breaks on limited-edition drops:

"Limited-edition releases, such as Nike sneakers, generate sudden massive traffic spikes." (Source: sources/2025-03-06-zalando-from-event-driven-chaos-to-a-blazingly-fast-serving-api.)

Without a cap, the owner pod for the hot key saturates CPU or connection count, drops everything including requests for the 999,999 other keys it owns, and the blast radius exceeds the single key.

How the cap works

  • Each pod has a measured current load (active requests, queue depth, or some other signal).
  • The cluster-wide average is tracked.
  • When a request arrives and its consistent-hash owner is at or above ceiling = balance_factor × average, the request goes to the next pod clockwise on the ring that is below the ceiling.
  • As load drains, the owner takes traffic again naturally.

Zalando's default is 2× average — tight enough to prevent overload, loose enough to preserve most cache locality.

Trade-offs

  • Cache-miss rate on spilled requests — the spill-over pod does not own the key in the cache; it'll miss on first touch. At 2× cap this is rare but non-zero; the balance factor tunes the miss-vs-overload trade.
  • Ownership clarity weakens — a request's routed pod is no longer a pure function of the key; clients that depend on strict stickiness may need a different mechanism.
  • Measurement lag matters — the "current load" signal has to update faster than the spike. Stale load signals can cause oscillation (pod A overloaded → all traffic to B → B overloaded → back to A).

Implementation footprint

  • Google Maglev / GCLB — bounded-load is the default.
  • Envoyring_hash and maglev load balancers accept a hash_balance_factor integer; values in [100, ∞) with 200 = 2× cap.
  • HAProxy — not natively, typically implemented with external policy.
  • Zalando Skipper — added as a first-class feature in skipper#1769 (Zalando's upstream contribution from PRAPI's deployment). Pairs with the fixed-100-position placement fix in skipper#1712 to get both stable cache locality and hot-key safety from the same ingress.

Seen in

Last updated · 501 distilled / 1,218 read