CONCEPT Cited by 1 source

Control-plane fan-out to the Kubernetes API¶

Definition¶

Control-plane fan-out to the Kubernetes API is the scaling anti-pattern where N identical data-plane pods each independently poll or watch the API server for the same resources, producing roughly N× load on the API server and etcd as a function of data-plane replica count. At a critical N the API server CPU-throttles, etcd saturates, and operations that depend on the control plane — pod scheduling, rolling updates, controller reconciliation — start to fail.

Canonical instance¶

Zalando's Skipper ingress deployment (Source: sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster). Each Skipper pod polled the Kubernetes API for Ingress and RouteGroup resources. At ~180 Skipper pods per cluster × 200 clusters × 15k Ingresses + 5k RouteGroups:

etcd was overwhelmed (the post's wording).
API server CPU was throttled.
"Our clusters lost the ability to schedule new pods effectively, and existing pod management operations began to fail."

The failure manifested on the scheduler, not on the ingress data plane — the ingress fleet's growth curve exported its load onto the shared control plane.

Why it's structural, not operational¶

The load isn't per-request; it's per-replica. Scaling the data plane (horizontally, because request load grew) automatically scales load on the control plane, which is not provisioned for the data plane's request curve. API server / etcd are sized for control-plane traffic (deploys, controllers), typically orders of magnitude lower than data-plane concurrency. Any data-plane deployment that treats the API server as a live config store hits this wall as it scales.

Escape hatches¶

Coalescing proxy — insert one process that polls or watches the API at a fixed cadence and serves all N pods via HTTP + ETag conditional polling or a push stream. Converts N× on etcd into 1× on etcd and N× on a much cheaper HTTP tier. Canonical shape: patterns/control-plane-proxy-with-etag-cache. Zalando's Route Server is the canonical instance — one routesrv deployment serves up to 300 Skipper pods at 3-second intervals, pushing load on etcd back into noise.
Shared informer cache per process — library-level coalescing inside a multi-consumer pod. Doesn't help when each pod is a separate replica.
xDS-style push stream — control plane streams deltas to each data-plane client (Envoy + xDS). Solves the fan- out, but requires a full xDS-class implementation.
Event bus — write API-derived state to a Kafka / Redis topic and have data-plane pods subscribe. Extra infra; only justified at very high fan-in / fan-out.

What doesn't work¶

Kubernetes Informers (push-based watches from the API server to each pod) do not solve this pattern at Zalando's scale (Source: sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster). Informers shift the work from N× poll to N× watch-push, but the API server still talks to all N clients on every change. Change events then reproduce the thundering-herd shape against etcd / API server, and "HPA won't be able to catch up and scale Kubernetes API and etcd" in response.

Signals that you're in this failure mode¶

Data-plane replica count is in the low hundreds per cluster.
etcd p99 latency / request rate trending with replica count, not with request load.
API server CPU throttling or request-queue saturation on the metrics the apiserver exposes.
Pod scheduling / rolling-update / webhook calls slow down correlated with data-plane rollouts, not user traffic.