CONCEPT Cited by 2 sources
Layer 7 load balancing¶
Layer 7 load balancing makes routing decisions at the application protocol layer (HTTP headers, gRPC method, request path, query parameters) — and, critically, per request on multiplexed connections. Contrast with Layer 4 LB (systems/kube-proxy, classic IPVS / HAProxy-tcp / AWS NLB) which operates on TCP/UDP packets and picks a backend once per connection.
Why the layer matters¶
Modern protocols multiplex many logical requests over one long-lived connection:
- HTTP/2: N concurrent streams per TCP connection.
- gRPC: runs on HTTP/2; all calls from one client → server go over one connection.
- HTTP/3 / QUIC: same idea over UDP.
L4 LB picks a backend when the TCP connection opens, then doesn't revisit. Every subsequent request on that connection lands on the same backend — regardless of that backend's current load. Across many clients, load skews hard:
- Some pods get hot-spotted and saturate (tail latency blows up).
- Other pods sit idle (over-provisioning cost).
- Capacity planning gets unreliable — average utilization looks fine, p99 is broken.
L7 LB parses the protocol, and can pick a different backend per request. The load-per-pod distribution flattens and tail latency stabilizes.
Where L7 LB gets implemented¶
- Sidecar proxy (Envoy + Istio) — transparent to the app, per-pod deployment, language-agnostic.
- Edge / ingress proxy — one tier at the cluster boundary, all ingress traffic routed through it.
- Client-side library — concepts/client-side-load-balancing; the app's own RPC client picks per request using local signals.
- Hybrid — L4 LB to a pool of L7 proxies (common cloud-LB layout).
Richer decisions than L4 allows¶
Because L7 LB sees the application payload, it can route on:
- Request content: route
/api/v1/inferenceto GPU pods and/api/v1/healthzto anything. - Header-based sticky sessions for the subset of traffic that actually needs them.
- Shard-aware routing (consistent hashing on a tenant ID).
- Observed load per backend (P2C, least-request).
- Topology: zone-aware routing, locality priority, etc.
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks' explicit motivation for moving from kube-proxy L4 LB to a client-side L7 LB architecture. Layer-4 "decision once per TCP connection" was incompatible with long-lived gRPC/HTTP/2 traffic; the fix was to move LB up to Layer 7 per-request.
- sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Robinhood does per-request weighted round-robin at L7 (Envoy + gRPC clients) on weights a PID controller computes per node per service. Same L7-is-required-for-HTTP/2 motivation as Databricks, but where Databricks uses L7 to implement P2C (an open-loop algorithm with richer signals), Dropbox uses L7 to implement feedback control. L7 is the substrate; the interesting question is what algorithm it serves.
Related¶
- concepts/client-side-load-balancing — one implementation strategy
- systems/envoy — the reference L7 proxy
- systems/kube-proxy — the L4 default being replaced
- systems/grpc — the workload that most often forces the move
- patterns/power-of-two-choices