CONCEPT Cited by 2 sources
Scaling latency¶
Scaling latency is the time between "we need more capacity" and "that capacity is actually serving traffic." For a cluster scaler, it decomposes into roughly:
scaling_latency =
detection_time # how long to notice we're under-resourced
+ decision_time # pick an instance type / count / zone
+ provisioning_time # cloud provider actually launches compute
+ bootstrap_time # boot + userdata + kubelet register
+ pod_scheduling_time # scheduler places pods on the new node
+ pod_ready_time # container pull + startup + healthcheck
Different autoscalers fail on different parts of this sum.
Why it matters¶
- User-visible p99 latency during traffic spikes. If
scaling_latency > spike_duration, your spike passes through the system as errors / queue backpressure / timeouts before capacity catches up. - Over-provision as compensation. Teams with slow autoscalers over-provision baseline capacity to hide scaling latency — which converts an autoscaler problem into a cost problem.
- Service-level disruption. Salesforce's post names "multi- minute delays during demand spikes and degraded user experience" as one of the central ASG / Cluster Autoscaler limitations that drove the migration to Karpenter (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters).
The Cluster Autoscaler / ASG tax¶
Kubernetes's Cluster Autoscaler on AWS inflates scaling latency through a chain of indirections:
- Cluster Autoscaler sees an unschedulable pod.
- Asks the matching Auto Scaling group to grow.
- ASG calls EC2
RunInstances. - Instance launches, boots, ENI attaches, userdata runs, kubelet registers.
- Now the scheduler can place pods.
Each hop has its own latency budget; the sum at scale is minutes.
Karpenter's collapse to seconds¶
Karpenter skips steps 2-3:
- Karpenter sees pending pods.
- Picks instance types + AZs by solving a
bin-packing problem over pending pods vs. the
EC2NodeClass's allowed instance types. - Calls EC2
RunInstancesdirectly. - Node boots and registers.
- Scheduler places pods.
Salesforce reports minutes → seconds on this metric post- migration (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters). The win isn't a faster step — it's eliminating the ASG round trip entirely.
Related¶
- concepts/cold-start — similar single-request latency concept for serverless; scaling latency is the multi-instance analog.
- concepts/bin-packing — Karpenter's algorithm in step 2 above.
- systems/karpenter / systems/cluster-autoscaler — the two autoscalers whose scaling-latency profiles differ structurally.
- systems/aws-auto-scaling-groups — the indirection CA inherits.
Seen in¶
- sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — scaling latency is named as one of four structural motivations for the 1,000-cluster migration; CA/ASG minutes → Karpenter seconds is reported as a headline outcome.
- sources/2026-04-07-mongodb-predictive-auto-scaling-an-experiment — managed-database-tier instance of the same latency force: MongoDB Atlas's reactive auto-scaler pays "a few minutes of overload" detection + "several minutes" scaling operation on top of one- tier-at-a-time motion. Predictive auto-scaling hides the full scaling-latency sum from observed p99 by starting the op before load arrives. Sibling to the cluster-scheduler framing above: same force (detection + decision + provisioning + warmup), different layer, different remediation (predictive forecasting vs. per-pod node provisioning).