PATTERN Cited by 2 sources
Slow-start ramp-up (new-pod warmup)¶
Slow-start ramp-up: when a newly registered backend appears in the load-balancer's pool, cap its share of traffic at a low fraction and ramp it up over a warm-up window — rather than letting it immediately take its fair share. Standard shapes: linear ramp (0 → 100% over N seconds), error-aware ramp (slow down further if error rate is elevated), or separate warmup traffic path.
The problem it solves: fresh pods haven't warmed their caches / JIT / connection pools; the first burst of real traffic hits cold paths and either (a) sees bad tail latency, (b) errors out, or (c) trips circuit breakers and gets marked unhealthy even though the pod is fine, just not warm.
When the problem bites¶
Slow-start matters most when LB is both per-request and fresh-pod-aware. Three common cases:
- Client-side LB with fast discovery. A patterns/proxyless-service-mesh / xDS pipeline registers a new pod within ms; the client's LB algorithm happily starts sending it P2C-fair share immediately.
- Rolling deploys. Blue-green flips or surge pods come online hot.
- Autoscaling. HPA / KEDA adds pods during traffic spikes — exactly when the surge will crash the new pods if they don't ramp.
Before client-side LB, sidecar/L4 designs mostly avoided this by long-lived connection reuse: existing callers kept their connections to old pods; new pods only got traffic as new connections rolled in, which was naturally slow. Client-side per-request LB removes this accidental warm-up window, exposing the cold-start problem at scale.
Implementation shapes¶
- Time-based linear ramp. Endpoint weight scales from 0% → 100% over N seconds after registration. Envoy supports
slow_start_configwith exactly this. - Error-aware backoff on top. If the new pod shows elevated error rate, slow the ramp further or pause. Prevents thrashing when a genuinely broken pod joins.
- Dedicated warmup framework. A separate code path that hits the pod with safe, representative requests before the LB flips its weight up. Lets the app load caches / JIT-compile hot paths under controlled load.
- Readiness gate. Pod doesn't advertise
Readyto EndpointSlices until a warmup script completes. Control-plane then never offers the pod as an LB target until it's warmed.
Seen in¶
- sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — explicit cold-start challenge surfaced by the rollout: "new pods began receiving traffic immediately, which surfaced cold-start issues where they handled requests before being fully warmed up." Fix combines (a) slow-start ramp-up, (b) biasing traffic away from pods with higher observed error rates, and (c) a dedicated warmup framework. Databricks names all three as necessary once you move to client-side LB.
- sources/2024-10-28-dropbox-robinhood-in-house-load-balancing — Robinhood runs slow-start as a corollary of feedback control, not a separate mechanism: a new node arriving with 0 utilization would cause PID feedback to oscillate if given "fair share" immediately; instead, the LBS sets new-node weight to a low initial value and lets the PID controller ramp it up to fleet-average over a few control cycles. A structural rather than explicit-timer-based slow-start.
Related¶
- concepts/cold-start — the underlying latency phenomenon
- concepts/client-side-load-balancing — the deployment model that surfaces this problem
- patterns/proxyless-service-mesh — the architectural class where it's most acute