PATTERN Cited by 2 sources
Power of Two Choices (P2C)¶
Power of Two Choices (P2C): instead of picking one backend uniformly at random, pick two at random and route the request to the one with fewer active requests / lower observed load. A tiny change to pure-random, but it exponentially flattens load distribution — well-known result from Mitzenmacher et al. (Harvard lecture notes).
Why it works: with pure random, bad luck can drop several requests in a row on the same unlucky pod. With P2C, every decision compares against a sample of two, so local imbalances get corrected constantly. The math: expected max load shrinks from log N / log log N (random) to log log N / log 2 (P2C) — approximately constant for practical N.
When to use it¶
- Client-side LB where the caller has an observable "load" signal per backend (active requests is the usual one).
- Default strategy when you don't have a specific reason to use weighted / shard-aware / consistency-hashing.
- Works well under bursty traffic because the decision is stateless — no coordination between clients needed.
When it's not enough¶
- You need stickiness (session affinity) for some reason → consistent hashing.
- You need shard awareness (request has to go to the owner of a particular key) → key-based dispatch.
- Workloads where the observable signal (active requests) doesn't reflect real load — e.g. long-tailed requests where one in-flight call can equal 100 short calls. Mitigations: weight by expected cost, use latency EWMA instead of count.
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — P2C is Databricks' default client-side LB algorithm, embedded in their Armeria RPC library. Their claim: "P2C strikes a strong balance between performance and implementation simplicity, consistently leading to uniform traffic distribution across endpoints." More exotic strategies (CPU/metric-skewed routing) were tried and retracted in favor of P2C + direct health signals. Kept-it-simple worked best.
- sources/2026-05-08-databricks-how-superhuman-and-databricks-built-a-200k-qps-inference-platform-together — P2C at 200,000+ QPS sustained on managed external inference, with explicit framing that default Kubernetes round-robin fails at this scale. The 2026-05-08 Databricks / Superhuman post promotes the same algorithm out of intra-cluster Armeria RPC (2025-10-01) into the Databricks Model Serving stack as the load-balancing substrate for Superhuman's 200K-QPS grammar-correction LLM. Quotes: "While the default Kubernetes round robin load balancer is sufficient at low QPS, our tests revealed that this performance degrades at higher QPS, with uneven request distribution creating hotspots that spike tail latency." And: "For each request, two candidate pods are sampled and traffic is routed to whichever has fewer active requests, preventing the hotspots that round-robin creates at high QPS." The "active requests" signal is per-pod
request_concurrency— same primitive the autoscaler tracks averaged-across-pods for capacity decisions. One signal, two control loops (P2C reads it for routing; autoscaler reads it for scaling). Substrate is the EDS control plane, watching the Kubernetes API forServicesandEndpointSlices. This is the wiki's canonical 200K-QPS-sub-1s-p99-with-4-9's production-validation datum for P2C — distinct from the 2025-10-01 intra-cluster validation, distinct from the Zalando PRAPI batch-component validation. Stack-altitude extension: P2C is now canonicalised at three altitudes — intra-cluster service mesh (Databricks Armeria RPC), per-endpoint LB-strategy choice (Zalando PRAPI), and managed external inference (Databricks Model Serving). - — Zalando's PRAPI Batch component uses P2C explicitly because stickiness is not required: the batch endpoint unpacks a multi-item request into concurrent single-item lookups, and aggregates on return. "It uses the Power of Two Random Choices algorithm, routing requests to the less-loaded of two randomly selected pods." The Products component, by contrast, uses CHLB with bounded-load because it needs cache locality per product ID. Same cluster, two load-balancing algorithms picked per-endpoint — P2C where stickiness is harmful, CHLB where stickiness is load-bearing.
- Canonical implementations: Envoy's
LEAST_REQUESTLB policy uses P2C under the hood.
Related¶
- concepts/client-side-load-balancing — the deployment model Databricks uses
- patterns/zone-affinity-routing — overlaid on top of P2C (P2C within preferred zone, spill over)
- concepts/layer-7-load-balancing
- concepts/feedback-control-load-balancing — the convergent (PID-style) alternative to P2C's reactive selection; complementary, not competing — P2C is stateless per-request while feedback control is stateful across-requests; running P2C inside feedback-controlled weights is a plausible composition. See systems/dropbox-robinhood for the feedback-control realization.