Skip to content

CONCEPT Cited by 2 sources

Client-side load balancing

Client-side load balancing means the caller chooses which backend instance to send a request to, rather than delegating that decision to a proxy (L7 sidecar, reverse proxy) or the kernel (L4 NAT, systems/kube-proxy). The caller holds a live view of healthy endpoints — typically streamed from a service-discovery control plane — and runs the LB algorithm in-process per request.

Sits at the opposite end of the axis from sidecar-based service meshes: both solve L7 load balancing, but locate the decision in different places.

What makes it work

  1. A service-discovery feed the client can subscribe to — DNS doesn't cut it (no metadata, stale cache); concepts/xds-protocol, gRPC-LB, or a bespoke streaming API do.
  2. Shared client library across the fleet — otherwise every service has to re-implement subscription + LB, and staying in sync with the control plane becomes a coordination problem.
  3. An LB algorithm that works with per-request granularity and local information: patterns/power-of-two-choices is the common-case default; weighted / zone-aware / consistent-hashing are pluggable.

Advantages over kernel-L4 and proxy-L7

  • Per-request decisions — breaks the "one pod picked per long-lived TCP connection" trap that kube-proxy hits with HTTP/2 / gRPC.
  • Richer signals. The client sees its own in-flight request counts, observed error rates, latency distribution to each endpoint — data a sidecar could approximate but a kernel NAT can't.
  • One less network hop. No sidecar proxy in the critical path; no CPU/memory/latency tax per pod.
  • Full metadata access. Zone labels, shards, readiness flags are visible to the LB algorithm; enables patterns/zone-affinity-routing, shard-aware routing, etc.
  • Cheap ops at scale. Avoids the "manage thousands of sidecars" problem a mesh introduces.

Tradeoffs / where it loses

  • Language coverage. The library ships per-language. Polyglot orgs either maintain N libraries or fall back to a sidecar. The proxyless model needs a dominant-language monoculture to be cheap. (Databricks' post names this explicitly as the argument for sidecar meshes "when you can't ship client libraries everywhere.")
  • Blast radius. A bug in the shared client library blasts the fleet on the next redeploy. Sidecars are independently upgradable.
  • Control-plane freshness = correctness. If EDS lags, every client routes off stale topology. DNS at least had slow TTL fallback.
  • Edge / non-app traffic stays outside the mesh. Non-library callers, ingress paths, batch jobs can't participate without another mechanism.

Seen in

  • sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks' proxyless architecture: xDS control plane (systems/databricks-endpoint-discovery-service) feeds an Armeria-embedded client-side LB library doing Power of Two Choices + zone-affinity-with-spillover. Delivered uniform QPS, stabilized P90 latency, ~20% pod-count reduction — explicitly preferred over Istio for a predominantly-Scala fleet.
  • sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Dropbox's Robinhood is a client-side-LB architecture with the control-plane-side algorithm being PID feedback, not an algorithm the client computes locally. Client library (Envoy / gRPC) reads endpoint weights from the routing DB via EDS and does weighted round-robin per request. This splits the "client side" concern: per-request routing is in the client (keeps the algorithm simple, never touched after launch), while weight derivation is in the LBS where it can evolve freely. Generalizable framing: in a client-side LB design, keep the client algorithm as invariant as possible (client upgrades take months) and do all policy evolution in the control plane (weight changes ship in minutes).
Last updated · 200 distilled / 1,178 read