Skip to content

CONCEPT Cited by 1 source

Control-plane fan-out to the Kubernetes API

Definition

Control-plane fan-out to the Kubernetes API is the scaling anti-pattern where N identical data-plane pods each independently poll or watch the API server for the same resources, producing roughly N× load on the API server and etcd as a function of data-plane replica count. At a critical N the API server CPU-throttles, etcd saturates, and operations that depend on the control plane — pod scheduling, rolling updates, controller reconciliation — start to fail.

Canonical instance

Zalando's Skipper ingress deployment (Source: sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster). Each Skipper pod polled the Kubernetes API for Ingress and RouteGroup resources. At ~180 Skipper pods per cluster × 200 clusters × 15k Ingresses + 5k RouteGroups:

  • etcd was overwhelmed (the post's wording).
  • API server CPU was throttled.
  • "Our clusters lost the ability to schedule new pods effectively, and existing pod management operations began to fail."

The failure manifested on the scheduler, not on the ingress data plane — the ingress fleet's growth curve exported its load onto the shared control plane.

Why it's structural, not operational

The load isn't per-request; it's per-replica. Scaling the data plane (horizontally, because request load grew) automatically scales load on the control plane, which is not provisioned for the data plane's request curve. API server / etcd are sized for control-plane traffic (deploys, controllers), typically orders of magnitude lower than data-plane concurrency. Any data-plane deployment that treats the API server as a live config store hits this wall as it scales.

Escape hatches

  1. Coalescing proxy — insert one process that polls or watches the API at a fixed cadence and serves all N pods via HTTP + ETag conditional polling or a push stream. Converts N× on etcd into 1× on etcd and N× on a much cheaper HTTP tier. Canonical shape: patterns/control-plane-proxy-with-etag-cache. Zalando's Route Server is the canonical instance — one routesrv deployment serves up to 300 Skipper pods at 3-second intervals, pushing load on etcd back into noise.
  2. Shared informer cache per process — library-level coalescing inside a multi-consumer pod. Doesn't help when each pod is a separate replica.
  3. xDS-style push stream — control plane streams deltas to each data-plane client (Envoy + xDS). Solves the fan- out, but requires a full xDS-class implementation.
  4. Event bus — write API-derived state to a Kafka / Redis topic and have data-plane pods subscribe. Extra infra; only justified at very high fan-in / fan-out.

What doesn't work

Kubernetes Informers (push-based watches from the API server to each pod) do not solve this pattern at Zalando's scale (Source: sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster). Informers shift the work from N× poll to N× watch-push, but the API server still talks to all N clients on every change. Change events then reproduce the thundering-herd shape against etcd / API server, and "HPA won't be able to catch up and scale Kubernetes API and etcd" in response.

Signals that you're in this failure mode

  • Data-plane replica count is in the low hundreds per cluster.
  • etcd p99 latency / request rate trending with replica count, not with request load.
  • API server CPU throttling or request-queue saturation on the metrics the apiserver exposes.
  • Pod scheduling / rolling-update / webhook calls slow down correlated with data-plane rollouts, not user traffic.

See also

Last updated · 501 distilled / 1,218 read