PATTERN Cited by 1 source
Conservative capacity bin-packing during incident¶
Problem¶
An upstream capacity-provisioning failure (e.g. EC2 launch failure) has frozen the fleet at its current size. A known peak-traffic window is about to hit. The usual response — let the autoscaler add instances — is not available. Dropping traffic is not acceptable: these are paid customers, and the request load is real.
The fleet has finite running capacity and finite CPU headroom on each existing process. The question is how to trade that headroom for peak-coverage during the incident.
Solution¶
Bin-pack the workload more tightly than steady-state scheduling normally allows. Co-locate more processes per host than usual, or raise per-process utilisation ceilings, so the existing fleet can serve peak demand without needing new instances. Accept that the fleet is running closer to CPU capacity than is typical — and reverse the tighter packing once provisioning capability returns.
Verbatim from PlanetScale's 2025-10-20 incident post:
The most important intervention, though, was to temporarily change how we schedule vtgate processes for customers with autoscaling configured. We bin-packed vtgate processes more tightly than usual, running closer to CPU capacity than is typical, in order to provide ample capacity for the US work day.
The 2025-10-20 post-mortem names this as "the most important intervention" of the phase-2 response playbook.
Mechanics¶
The scheduling change typically has two dials:
- Density — more processes per host. For Kubernetes-scheduled workloads, lower the per-pod CPU request so more pods fit on each node; for process-per-host deployments, co-locate services that usually run on separate hosts.
- Utilisation ceiling — accept higher steady-state CPU percentage per process. Typical SRE practice is to keep fleets at 40–70 % of CPU to absorb spikes; during the incident, let them sit closer to 85–95 % and rely on short-burst peaks being absorbed by whatever headroom remains.
The trade: latency percentiles will degrade (less headroom for GC pauses, less slack for request-queue spikes), error rate may tick up under transient surges, cache-locality and noisy-neighbour effects get worse.
When this is right¶
- The alternative is dropping traffic. When the choice is "slight tail-latency degradation" vs "failed requests or queued admissions," tighter packing wins.
- The incident has a known or expected duration. Operating at 90% CPU for a 12-hour EC2-launch outage is tolerable; operating there indefinitely is a bad steady state.
- The workload is mostly stateless. Stateless proxies (vtgate in the 2025-10-20 case, Envoy, any stateless gateway) are good fits — per-process state is small, re-bin-packing is a scheduling change rather than a data migration.
- You can reverse it quickly. When capacity returns, loosen the packing back to steady-state so you don't accumulate tail-latency debt as a new normal.
When this is wrong¶
- Stateful workloads. Databases and caches have per-node memory footprints that don't compress the way CPU does; tightening a database's bin-pack means evicting working-set pages, not just running hotter.
- Workloads with hard tail-latency SLOs. Some paths really can't tolerate 85% CPU — real-time trading, real-time ad bidding, some video-streaming paths.
- The fleet is already near its ceiling. If steady-state is already 75% CPU, there isn't room to tighten further without crossing the cliff into starvation / GC-spiral regimes.
Composition with other incident-response moves¶
Conservative bin-packing rarely appears alone; it sits inside a broader playbook that aims to conserve existing capacity while reducing demand for new capacity:
- Pair with patterns/suspend-routine-capacity-churn-during-dependency-outage — pause the drain-and-terminate loop, hold vacated instances. Tighter bin-packing + preserved instance inventory = maximum concurrent running capacity.
- Pair with patterns/shed-load-during-capacity-shortage — cancel pending backups, advise customers to pause ETLs and delay queues. Reduces peak demand so the tighter pack has enough headroom.
- Redirect new-resource creation to an unaffected region so the packed fleet doesn't have to absorb growth.
Together these levers let a frozen-size fleet survive a peak traffic window that would normally require an autoscaler ramp.
Seen in¶
- sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20 — PlanetScale, Richard Crowley, 2025-11-03. Canonical wiki application. Phase 2 of the 2025-10-20 AWS us-east-1 incident: EC2-launch-failure window meets US-East-Coast Monday-morning vtgate-autoscale ramp. PlanetScale bin-packs vtgate processes tighter than usual, named as "the most important intervention" of the playbook. No numbers disclosed for the pre / post CPU utilisation or per-customer effect; the claim is qualitative ("ample capacity for the US work day").
Related¶
- concepts/bin-packing — the scheduling primitive this pattern adjusts the dial on.
- concepts/ec2-launch-failure-mode — the fault class that triggers the need for this pattern.
- concepts/diurnal-autoscaling-risk — the risk amplifier this pattern counters.
- systems/vtgate — the canonical workload that admits this intervention cleanly (stateless query router).
- patterns/suspend-routine-capacity-churn-during-dependency-outage — the conserve companion lever.
- patterns/shed-load-during-capacity-shortage — the reduce-demand companion lever.