Skip to content

PATTERN Cited by 1 source

Shed load during capacity shortage

Problem

During a capacity-provisioning outage (e.g. EC2 launch failure), the fleet is frozen at its current size and a peak-traffic window is approaching. The fleet must absorb the peak with the capacity it already has.

Two knobs are available: raise per-instance utilisation (via tighter bin-packing), or reduce the demand that the frozen fleet has to serve. This pattern is the demand-reduction side of the capacity balance.

Solution

Identify non-critical or deferrable demand on the affected fleet and remove it from the hot path for the duration of the incident. The specific mechanism depends on who controls the demand — operator-side demand (backups, ETLs, maintenance jobs) is directly cancellable; customer-side demand requires a request to the caller.

Three canonical cuts, verbatim from PlanetScale's 2025-10-20 incident post:

  • Temporarily disallowed creating new databases in AWS us-east-1 and changed the default region for new databases to AWS us-east-2.
  • Delayed scheduling additional backups and canceled pending backups that were waiting to launch an EC2 instance.
  • Advised PlanetScale Managed customers using vtgate autoscaling to shed whatever load they could by e.g. delaying queue processing or pausing ETL processes.

The last bullet is the customer-side request — explicit coaching that queues and ETLs are the first things to defer when a downstream system is capacity-constrained.

Mechanics

Three categories of sheddable load:

Operator-controlled background work

  • Scheduled backups — delay or cancel. Backups are durability infrastructure; missing a backup cycle is usually survivable (RPO degrades briefly; the next backup captures everything).
  • Scheduled migrations / rebalances / reshards — pause. These are future-capacity work; they can wait.
  • Compaction / vacuum / cleanup jobs — throttle or pause. They consume the same compute the hot path needs.

New-resource creation

  • Redirect defaults to unaffected regions — new databases go to us-east-2 instead of us-east-1 until us-east-1 provisioning returns.
  • Block / queue new-resource requests in affected region. "Temporarily disallowed creating new databases in AWS us-east-1."

Customer-side demand

  • Ask customers with deferrable batch loads to pause them. The canonical examples: ETL pipelines, async queue processors, analytics ingestion, cron-triggered reports.
  • Ask customers on autoscaling tiers to temporarily disable growth-triggering jobs. If the fleet can't grow, don't trigger the autoscaler.

When this is right

  • The shed load has a legitimate low-priority classification. Backups, ETLs, analytics, maintenance — all deferrable for hours or a day with no real-world harm.
  • The peak demand is known and concentrated. Shedding buys headroom for the peak; once the peak passes the shed load can be made up.
  • Customer communication channels are available. Shedding customer-side load requires reaching customers; if the status page is down (see SaaS dependency risk) the shed-request may not reach them in time.

When this is wrong

  • The shed load has real-time SLOs. A trading fleet can't shed its order-processing queue; a messaging fleet can't pause its delivery path; a payment fleet can't defer authorization.
  • Shedding creates correctness risk. Pausing a scheduled garbage collection may delay a durability guarantee; pausing a cache-warmup may take a cache-hit-rate cliff later.
  • The load is already shed and there's nothing more to cut.

Composition

This is the demand-reduction lever of the incident-response playbook; it composes with:

Contrast with request-time load shedding

patterns/shed-low-priority-under-load is the request-time, in-path version: a live request evaluator drops low-priority requests at admission when the server is saturated. The pattern documented here is operator-time, out-of-path — operator edits scheduling, cancels jobs, contacts customers to pause batch loads before the saturation hits. Same intent (reduce demand); different control loop (humans and cron vs request admission control). In a real incident, both usually run simultaneously.

Seen in

  • sources/2025-11-03-planetscale-aws-us-east-1-incident-2025-10-20 — PlanetScale, Richard Crowley, 2025-11-03. Canonical wiki application. Phase 2 of the 2025-10-20 AWS us-east-1 incident. Three concrete shed-moves (new-DB redirect to us-east-2, backup cancellation, customer ETL/queue pause advisory) as part of the incident response. No disclosure of demand reduction numbers; narrative suggests it was sufficient combined with tighter bin-packing to cover the US-East-Coast Monday-morning peak.
Last updated · 550 distilled / 1,221 read