CONCEPT Cited by 2 sources

Layer 7 load balancing¶

Layer 7 load balancing makes routing decisions at the application protocol layer (HTTP headers, gRPC method, request path, query parameters) — and, critically, per request on multiplexed connections. Contrast with Layer 4 LB (systems/kube-proxy, classic IPVS / HAProxy-tcp / AWS NLB) which operates on TCP/UDP packets and picks a backend once per connection.

Why the layer matters¶

Modern protocols multiplex many logical requests over one long-lived connection:

HTTP/2: N concurrent streams per TCP connection.
gRPC: runs on HTTP/2; all calls from one client → server go over one connection.
HTTP/3 / QUIC: same idea over UDP.

L4 LB picks a backend when the TCP connection opens, then doesn't revisit. Every subsequent request on that connection lands on the same backend — regardless of that backend's current load. Across many clients, load skews hard:

Some pods get hot-spotted and saturate (tail latency blows up).
Other pods sit idle (over-provisioning cost).
Capacity planning gets unreliable — average utilization looks fine, p99 is broken.

L7 LB parses the protocol, and can pick a different backend per request. The load-per-pod distribution flattens and tail latency stabilizes.

Where L7 LB gets implemented¶

Sidecar proxy (Envoy + Istio) — transparent to the app, per-pod deployment, language-agnostic.
Edge / ingress proxy — one tier at the cluster boundary, all ingress traffic routed through it.
Client-side library — concepts/client-side-load-balancing; the app's own RPC client picks per request using local signals.
Hybrid — L4 LB to a pool of L7 proxies (common cloud-LB layout).

Richer decisions than L4 allows¶

Because L7 LB sees the application payload, it can route on:

Request content: route /api/v1/inference to GPU pods and /api/v1/healthz to anything.
Header-based sticky sessions for the subset of traffic that actually needs them.
Shard-aware routing (consistent hashing on a tenant ID).
Observed load per backend (P2C, least-request).
Topology: zone-aware routing, locality priority, etc.

Seen in¶

sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks' explicit motivation for moving from kube-proxy L4 LB to a client-side L7 LB architecture. Layer-4 "decision once per TCP connection" was incompatible with long-lived gRPC/HTTP/2 traffic; the fix was to move LB up to Layer 7 per-request.
sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Robinhood does per-request weighted round-robin at L7 (Envoy + gRPC clients) on weights a PID controller computes per node per service. Same L7-is-required-for-HTTP/2 motivation as Databricks, but where Databricks uses L7 to implement P2C (an open-loop algorithm with richer signals), Dropbox uses L7 to implement feedback control. L7 is the substrate; the interesting question is what algorithm it serves.

concepts/client-side-load-balancing — one implementation strategy
systems/envoy — the reference L7 proxy
systems/kube-proxy — the L4 default being replaced
systems/grpc — the workload that most often forces the move
patterns/power-of-two-choices

Layer 7 load balancing¶

Why the layer matters¶

Where L7 LB gets implemented¶

Richer decisions than L4 allows¶

Seen in¶

Related¶