PATTERN Cited by 1 source

Control-plane proxy with ETag cache¶

Intent¶

Decouple a large data-plane fleet from a shared upstream config source (Kubernetes API, service registry, auth server, any config store) by inserting a single coalescing proxy between them. The proxy polls or watches the upstream at one cadence and serves downstream pods over HTTP using ETag conditional polling. Converts an N× fan-out on the shared upstream into a 1× poll + N× 304-gated delta channel on a cheap HTTP tier.

Structure¶

                                           ┌────────────────┐
                                           │ Data-plane pod │
                                           │   (Skipper)    │
                                           └───────▲────────┘
                                                   │ HTTP + ETag
                                                   │ (every Δ)
  ┌──────────────┐     poll @ Δ     ┌──────────────┴─────┐
  │ Upstream     │ ◄─────────────── │ Coalescing proxy   │
  │ (K8s API,    │                  │ (Route Server)     │ ◄── ... N more clients
  │  etcd, ...)  │                  │ - parse + compile  │
  └──────────────┘                  │ - compute ETag     │
                                    │ - serve /routes    │
                                    └────────────────────┘

Exactly one connection against the upstream; N cheap HTTP connections against the proxy.

Canonical instance¶

Zalando's Route Server (routesrv) (Source: sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster):

Upstream: Kubernetes API (Ingress + RouteGroup resources).
Proxy: Route Server — polls the API every 3 seconds, parses into Eskip, computes a table-wide ETag, serves GET /routes with HTTP 200 or 304.
Data plane: ~300 Skipper pods per cluster.
Result: ~180-Skipper load on etcd (enough to break pod scheduling) → 1× polling on etcd + 100 rps of cheap 304s from Route Server to Skipper. HPA ceiling raised from 180 to 300 pods with zero downtime, zero GMV loss.

When to apply¶

Data-plane replica count is high (hundreds+) and each replica needs roughly the same config.
The config source is a shared scaling bottleneck — typical examples: Kubernetes API + etcd, a database, an auth server, a config repo you'd otherwise poll.
Change volume is low relative to poll rate (so most polls return 304 — ETag caching pays off).
Kubernetes Informers are not a sufficient answer — they preserve the N× fan-out against the API server at change events (Zalando's reasoning for rejecting them: informer push still requires the API server to fan out to all N clients, which under a burst reproduces the same thundering herd on etcd + API server).

Trade-offs¶

Freshness floor equals poll interval. Operators upstream of the proxy can't get sub-interval propagation to data- plane pods — see concepts/polling-interval-as-freshness-budget.
The proxy is a new SPOF. Mitigate with last-known-good fallback on the data plane so the proxy being down is a freshness regression, not a traffic outage.
ETag granularity is a real design choice. One ETag per whole table (Zalando's choice) means every change triggers a full refetch by all clients. Per-resource ETags narrow that at the cost of server complexity.
Rollout risk. Inserting a new control-plane tier on the critical path is high-stakes — pair with three-mode rollout to diff old vs new outputs before cutover.

Operational numbers (Zalando)¶

Dimension	Before	After
Who polls K8s API	~180 Skippers	1 Route Server per cluster
Skipper HPA ceiling	~180	300
Poll interval	per-Skipper	3 seconds, centralised
etcd overload	threatening	resolved
API-server CPU throttle	yes	resolved
GMV loss on rollout	—	0

Contrast with other control-plane shapes¶

xDS / gRPC streaming (Envoy) — push-based; lower median propagation latency but requires xDS server infrastructure and per-client connection state. ETag polling is simpler and stateless on the server.
Sidecar informer cache — coalesces at the process boundary, not across replicas; doesn't solve the N-replicas fan-out.
Event bus (Kafka / Redis topic) — viable but adds unrelated infra; the ETag-proxy pattern is a minimal single-process solution when HTTP is already in the stack.