PATTERN Cited by 1 source
Shed load during capacity shortage¶
Problem¶
During a capacity-provisioning outage (e.g. EC2 launch failure), the fleet is frozen at its current size and a peak-traffic window is approaching. The fleet must absorb the peak with the capacity it already has.
Two knobs are available: raise per-instance utilisation (via tighter bin-packing), or reduce the demand that the frozen fleet has to serve. This pattern is the demand-reduction side of the capacity balance.
Solution¶
Identify non-critical or deferrable demand on the affected fleet and remove it from the hot path for the duration of the incident. The specific mechanism depends on who controls the demand — operator-side demand (backups, ETLs, maintenance jobs) is directly cancellable; customer-side demand requires a request to the caller.
Three canonical cuts, verbatim from PlanetScale's 2025-10-20 incident post:
- Temporarily disallowed creating new databases in AWS us-east-1 and changed the default region for new databases to AWS us-east-2.
- Delayed scheduling additional backups and canceled pending backups that were waiting to launch an EC2 instance.
- Advised PlanetScale Managed customers using vtgate autoscaling to shed whatever load they could by e.g. delaying queue processing or pausing ETL processes.
The last bullet is the customer-side request — explicit coaching that queues and ETLs are the first things to defer when a downstream system is capacity-constrained.
Mechanics¶
Three categories of sheddable load:
Operator-controlled background work¶
- Scheduled backups — delay or cancel. Backups are durability infrastructure; missing a backup cycle is usually survivable (RPO degrades briefly; the next backup captures everything).
- Scheduled migrations / rebalances / reshards — pause. These are future-capacity work; they can wait.
- Compaction / vacuum / cleanup jobs — throttle or pause. They consume the same compute the hot path needs.
New-resource creation¶
- Redirect defaults to unaffected regions — new databases
go to
us-east-2instead ofus-east-1untilus-east-1provisioning returns. - Block / queue new-resource requests in affected region. "Temporarily disallowed creating new databases in AWS us-east-1."
Customer-side demand¶
- Ask customers with deferrable batch loads to pause them. The canonical examples: ETL pipelines, async queue processors, analytics ingestion, cron-triggered reports.
- Ask customers on autoscaling tiers to temporarily disable growth-triggering jobs. If the fleet can't grow, don't trigger the autoscaler.
When this is right¶
- The shed load has a legitimate low-priority classification. Backups, ETLs, analytics, maintenance — all deferrable for hours or a day with no real-world harm.
- The peak demand is known and concentrated. Shedding buys headroom for the peak; once the peak passes the shed load can be made up.
- Customer communication channels are available. Shedding customer-side load requires reaching customers; if the status page is down (see SaaS dependency risk) the shed-request may not reach them in time.
When this is wrong¶
- The shed load has real-time SLOs. A trading fleet can't shed its order-processing queue; a messaging fleet can't pause its delivery path; a payment fleet can't defer authorization.
- Shedding creates correctness risk. Pausing a scheduled garbage collection may delay a durability guarantee; pausing a cache-warmup may take a cache-hit-rate cliff later.
- The load is already shed and there's nothing more to cut.
Composition¶
This is the demand-reduction lever of the incident-response playbook; it composes with:
- patterns/conservative-capacity-bin-packing-during-incident — squeeze more out of what's there.
- patterns/suspend-routine-capacity-churn-during-dependency-outage — don't lose what's there.
- Together, these three form the "frozen-fleet survives peak window" operator playbook.
Contrast with request-time load shedding¶
patterns/shed-low-priority-under-load is the request-time, in-path version: a live request evaluator drops low-priority requests at admission when the server is saturated. The pattern documented here is operator-time, out-of-path — operator edits scheduling, cancels jobs, contacts customers to pause batch loads before the saturation hits. Same intent (reduce demand); different control loop (humans and cron vs request admission control). In a real incident, both usually run simultaneously.
Seen in¶
- sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20 — PlanetScale, Richard Crowley, 2025-11-03. Canonical wiki application. Phase 2 of the 2025-10-20 AWS us-east-1 incident. Three concrete shed-moves (new-DB redirect to us-east-2, backup cancellation, customer ETL/queue pause advisory) as part of the incident response. No disclosure of demand reduction numbers; narrative suggests it was sufficient combined with tighter bin-packing to cover the US-East-Coast Monday-morning peak.
Related¶
- concepts/ec2-launch-failure-mode — the fault class that makes this pattern necessary.
- concepts/diurnal-autoscaling-risk — the specific risk surface this pattern counters when shedding autoscale- triggering demand.
- concepts/blast-radius — shed-load bounds the blast radius by keeping the frozen fleet below saturation.
- patterns/conservative-capacity-bin-packing-during-incident — sister pattern (supply-side densification).
- patterns/suspend-routine-capacity-churn-during-dependency-outage — sister pattern (supply-side preservation).
- patterns/shed-low-priority-under-load — request-time variant for in-path load shedding.