PATTERN Cited by 1 source

Shed load during capacity shortage¶

Problem¶

During a capacity-provisioning outage (e.g. EC2 launch failure), the fleet is frozen at its current size and a peak-traffic window is approaching. The fleet must absorb the peak with the capacity it already has.

Two knobs are available: raise per-instance utilisation (via tighter bin-packing), or reduce the demand that the frozen fleet has to serve. This pattern is the demand-reduction side of the capacity balance.

Solution¶

Identify non-critical or deferrable demand on the affected fleet and remove it from the hot path for the duration of the incident. The specific mechanism depends on who controls the demand — operator-side demand (backups, ETLs, maintenance jobs) is directly cancellable; customer-side demand requires a request to the caller.

Three canonical cuts, verbatim from PlanetScale's 2025-10-20 incident post:

Temporarily disallowed creating new databases in AWS us-east-1 and changed the default region for new databases to AWS us-east-2.

Delayed scheduling additional backups and canceled pending backups that were waiting to launch an EC2 instance.

Advised PlanetScale Managed customers using vtgate autoscaling to shed whatever load they could by e.g. delaying queue processing or pausing ETL processes.

The last bullet is the customer-side request — explicit coaching that queues and ETLs are the first things to defer when a downstream system is capacity-constrained.

Mechanics¶

Three categories of sheddable load:

Operator-controlled background work¶

Scheduled backups — delay or cancel. Backups are durability infrastructure; missing a backup cycle is usually survivable (RPO degrades briefly; the next backup captures everything).
Scheduled migrations / rebalances / reshards — pause. These are future-capacity work; they can wait.
Compaction / vacuum / cleanup jobs — throttle or pause. They consume the same compute the hot path needs.

New-resource creation¶

Redirect defaults to unaffected regions — new databases go to us-east-2 instead of us-east-1 until us-east-1 provisioning returns.
Block / queue new-resource requests in affected region. "Temporarily disallowed creating new databases in AWS us-east-1."

Customer-side demand¶

Ask customers with deferrable batch loads to pause them. The canonical examples: ETL pipelines, async queue processors, analytics ingestion, cron-triggered reports.
Ask customers on autoscaling tiers to temporarily disable growth-triggering jobs. If the fleet can't grow, don't trigger the autoscaler.

When this is right¶

The shed load has a legitimate low-priority classification. Backups, ETLs, analytics, maintenance — all deferrable for hours or a day with no real-world harm.
The peak demand is known and concentrated. Shedding buys headroom for the peak; once the peak passes the shed load can be made up.
Customer communication channels are available. Shedding customer-side load requires reaching customers; if the status page is down (see SaaS dependency risk) the shed-request may not reach them in time.

When this is wrong¶

The shed load has real-time SLOs. A trading fleet can't shed its order-processing queue; a messaging fleet can't pause its delivery path; a payment fleet can't defer authorization.
Shedding creates correctness risk. Pausing a scheduled garbage collection may delay a durability guarantee; pausing a cache-warmup may take a cache-hit-rate cliff later.
The load is already shed and there's nothing more to cut.

Composition¶

This is the demand-reduction lever of the incident-response playbook; it composes with:

patterns/conservative-capacity-bin-packing-during-incident — squeeze more out of what's there.
patterns/suspend-routine-capacity-churn-during-dependency-outage — don't lose what's there.
Together, these three form the "frozen-fleet survives peak window" operator playbook.

Contrast with request-time load shedding¶

patterns/shed-low-priority-under-load is the request-time, in-path version: a live request evaluator drops low-priority requests at admission when the server is saturated. The pattern documented here is operator-time, out-of-path — operator edits scheduling, cancels jobs, contacts customers to pause batch loads before the saturation hits. Same intent (reduce demand); different control loop (humans and cron vs request admission control). In a real incident, both usually run simultaneously.

Seen in¶

sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20 — PlanetScale, Richard Crowley, 2025-11-03. Canonical wiki application. Phase 2 of the 2025-10-20 AWS us-east-1 incident. Three concrete shed-moves (new-DB redirect to us-east-2, backup cancellation, customer ETL/queue pause advisory) as part of the incident response. No disclosure of demand reduction numbers; narrative suggests it was sufficient combined with tighter bin-packing to cover the US-East-Coast Monday-morning peak.

concepts/ec2-launch-failure-mode — the fault class that makes this pattern necessary.
concepts/diurnal-autoscaling-risk — the specific risk surface this pattern counters when shedding autoscale- triggering demand.
concepts/blast-radius — shed-load bounds the blast radius by keeping the frozen fleet below saturation.
patterns/conservative-capacity-bin-packing-during-incident — sister pattern (supply-side densification).
patterns/suspend-routine-capacity-churn-during-dependency-outage — sister pattern (supply-side preservation).
patterns/shed-low-priority-under-load — request-time variant for in-path load shedding.