Skip to content

CONCEPT Cited by 2 sources

Layer 7 load balancing

Layer 7 load balancing makes routing decisions at the application protocol layer (HTTP headers, gRPC method, request path, query parameters) — and, critically, per request on multiplexed connections. Contrast with Layer 4 LB (systems/kube-proxy, classic IPVS / HAProxy-tcp / AWS NLB) which operates on TCP/UDP packets and picks a backend once per connection.

Why the layer matters

Modern protocols multiplex many logical requests over one long-lived connection:

  • HTTP/2: N concurrent streams per TCP connection.
  • gRPC: runs on HTTP/2; all calls from one client → server go over one connection.
  • HTTP/3 / QUIC: same idea over UDP.

L4 LB picks a backend when the TCP connection opens, then doesn't revisit. Every subsequent request on that connection lands on the same backend — regardless of that backend's current load. Across many clients, load skews hard:

  • Some pods get hot-spotted and saturate (tail latency blows up).
  • Other pods sit idle (over-provisioning cost).
  • Capacity planning gets unreliable — average utilization looks fine, p99 is broken.

L7 LB parses the protocol, and can pick a different backend per request. The load-per-pod distribution flattens and tail latency stabilizes.

Where L7 LB gets implemented

  1. Sidecar proxy (Envoy + Istio) — transparent to the app, per-pod deployment, language-agnostic.
  2. Edge / ingress proxy — one tier at the cluster boundary, all ingress traffic routed through it.
  3. Client-side libraryconcepts/client-side-load-balancing; the app's own RPC client picks per request using local signals.
  4. Hybrid — L4 LB to a pool of L7 proxies (common cloud-LB layout).

Richer decisions than L4 allows

Because L7 LB sees the application payload, it can route on:

  • Request content: route /api/v1/inference to GPU pods and /api/v1/healthz to anything.
  • Header-based sticky sessions for the subset of traffic that actually needs them.
  • Shard-aware routing (consistent hashing on a tenant ID).
  • Observed load per backend (P2C, least-request).
  • Topology: zone-aware routing, locality priority, etc.

Seen in

  • sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks' explicit motivation for moving from kube-proxy L4 LB to a client-side L7 LB architecture. Layer-4 "decision once per TCP connection" was incompatible with long-lived gRPC/HTTP/2 traffic; the fix was to move LB up to Layer 7 per-request.
  • sources/2024-10-28-dropbox-robinhood-in-house-load-balancingRobinhood does per-request weighted round-robin at L7 (Envoy + gRPC clients) on weights a PID controller computes per node per service. Same L7-is-required-for-HTTP/2 motivation as Databricks, but where Databricks uses L7 to implement P2C (an open-loop algorithm with richer signals), Dropbox uses L7 to implement feedback control. L7 is the substrate; the interesting question is what algorithm it serves.
Last updated · 200 distilled / 1,178 read