SYSTEM Cited by 1 source

kube-proxy¶

kube-proxy is the Kubernetes node-level component that implements the default Service load balancing. It watches the Kubernetes API for Service / Endpoint changes and programs kernel-level rules (iptables, IPVS, or eBPF depending on mode) so that packets sent to a ClusterIP virtual IP are rewritten to one of the backend pod IPs.

How it works (default ClusterIP mode)¶

Layer 4 only. Decisions happen on the kernel's packet path; kube-proxy doesn't parse HTTP / gRPC.
Per-connection decision. The backend pod is picked once when the TCP connection is established; every subsequent packet on that connection goes to the same pod.
Basic algorithms. Round-robin or random selection; no weights, no topology awareness, no per-request decisions.

Modes¶

iptables mode (historical default) — programs iptables NAT rules; selects a backend pod using a probability-based jump per rule chain. Simple and stable but not strictly round-robin — see kube proxy iptables probability.
IPVS mode — uses the Linux IPVS kernel module for L4 load balancing; supports more scheduling algorithms (rr, lc, wrr, etc.) and scales better at high service counts.
nftables mode — newer, replaces iptables with nftables.
eBPF replacements (Cilium, kube-router) — bypass kube-proxy entirely, implementing service load balancing in eBPF for lower latency and more control.

Why this fails for gRPC / HTTP/2¶

gRPC runs on long-lived HTTP/2 connections. Because kube-proxy picks a pod per connection, not per request:

A client that has one HTTP/2 connection to service X keeps sending every request to the same backend pod, regardless of that pod's load.
Across many clients, the aggregate distribution is skewed: some pods get orders of magnitude more traffic than others.
Tail latency rises (hot pods saturate), capacity planning becomes guessing (even though average utilization looks fine, p99 blows up).
kube-proxy can't fix this — it's architecturally L4 and stateless per-connection.

The standard workarounds push LB up to L7: concepts/client-side-load-balancing, concepts/layer-7-load-balancing, or a sidecar/edge proxy.

Seen in¶

sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — Databricks cites kube-proxy's per-connection L4 model as the root cause of traffic skew on their Scala/gRPC fleet; they bypass it entirely in favor of client-side LB via a custom xDS control plane (systems/databricks-endpoint-discovery-service).

systems/kubernetes — parent system.
concepts/layer-7-load-balancing / concepts/client-side-load-balancing — the workarounds for the L4/per-connection failure mode.

kube-proxy¶

How it works (default ClusterIP mode)¶

Modes¶

Why this fails for gRPC / HTTP/2¶

Seen in¶

Related¶