Skip to content

PATTERN Cited by 1 source

Shed low-priority under load

Problem

Under a capacity-exhausting load spike (viral event, bad deploy, DDoS), every query contends for the same finite resources (CPU, I/O, connections, worker processes). Without differentiation, the queries users care most about — authentication, loading the timeline — are treated identically to queries users wouldn't miss — impression counters, trending widgets. A flood of lightweight low-value traffic can starve the high-value traffic. The user-facing outcome is a total outage: the app is unusable, users leave.

A total outage is not the worst thing the database can do under spike. A partial outage — non-essential features degraded, core features still working — is strictly better, and often unnoticeable to users because non-essential features are named non-essential for a reason.

Solution

Classify traffic by user-perceived priority, give each class its own resource budget, shed the lowest-priority class when capacity is exhausted. The three-move shape:

  1. Classify. Tag every query, at the application layer, with a priority tier reflecting user-perceived feature importance. See concepts/query-priority-classification for the canonical three-tier scheme (critical / important / best-effort). Tagging uses SQLCommenter or equivalent SQL-comment metadata.
  2. Budget. Assign each priority tier a resource budget sized so that all tiers coexist comfortably under normal load. The budget is an admission control — over-budget queries are blocked and the caller is expected to retry.
  3. Shed. Under spike, the budget dials already shrink the lowest-priority tier automatically. The operator can also live-disable or further tighten the lowest-priority budget to reclaim capacity for higher tiers — no application deploy required.

Canonical three-tier budget example

Ben Dicken's canonical social-media instance (Source: sources/2026-04-21-planetscale-graceful-degradation-in-postgres):

Tier Budget dials Rationale
Critical (auth, post creation, post fetch, profiles) No server-share cap, no burst cap; per-query max = 2 seconds Critical must always be served; per-query cap protects against rogue slow queries on the critical path
Important (comments, search, DMs) Server share = 25%; moderate max concurrent workers Plenty of room under normal load, some blocked under spike, can be starved only when critical fills the box
Best-effort (like/impression/bookmark counts, trending, notifications, analytics) Server share = 20%; low max concurrent workers; live-disable-able under spike First thing cut when capacity is tight; under extreme spike, fully disabled from the budget UI

Under a 10× viral-event spike, operators click into the best-effort budget and completely disable this traffic. Changes happen live. Users temporarily stop receiving notifications and seeing impression counts but keep authenticating and viewing posts. "What could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality."

Why this pattern is load-bearing

  • Capacity management is unavoidable — databases have finite CPU / I/O / connection pools; under enough load, somebody's query has to be blocked. The only choice is which.
  • User-perceived priority is the right answer to "which." Cost-based shedding, rate-based shedding, and SLA-based shedding all fail the "which queries do users care about?" test in common cases.
  • The alternative is a whole-system outage. Treating all traffic equally under spike makes the whole app fail together. Users don't distinguish "some impression counts are stale" (the gracefully-degraded outcome) from "the whole site is down" (the ungraceful-degradation outcome) — they only notice the latter.
  • Budget changes are live-deployable. No application rollout, no risky config push, no deploy pipeline between the operator and the shed lever.

Relationship to workload-class-resource-budget

The workload-class resource budget pattern is the mechanism; this pattern is one of its applications. Both patterns use the same primitives (SQLCommenter tag → budget mapping → over-budget rejection with retry), but the framing axis differs:

Axis Workload-class resource budget Shed low-priority under load
Goal Coexistence of multiple workload classes on one cluster Survive a spike with degraded-not-dead service
Threat model Mixed-workload contention pins MVCC horizon, starves autovacuum, monopolises I/O Traffic spike exhausts capacity; low-priority traffic starves high-priority traffic
Classification axis Workload type (transactional / analytics / job-queue / batch) User-perceived feature importance (critical / important / best-effort)
Canonical source sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy sources/2026-04-21-planetscale-graceful-degradation-in-postgres
Mechanism systems/planetscale-traffic-control systems/planetscale-traffic-control

Same mechanism, two framings. A single Traffic Control configuration can serve both goals simultaneously — classify on both axes, budget on both.

Warn mode before enforce

The budget-tuning lifecycle is warn → monitor → enforce (Source: sources/2026-04-21-planetscale-graceful-degradation-in-postgres). Threshold-picking is a distribution-learning problem; operators observe over-budget counts in warn mode before committing to a threshold that actually rejects requests.

Requirements on the caller

  • Every query path must carry a priority tag. Untagged queries fall into a default bucket (typically unconstrained, which undermines the whole scheme under load).
  • Callers must retry on budget rejection. Throttling without retry converts "database survives the spike" into "database survives but half the requests failed." Retry must use exponential backoff with jitter to avoid retry storms.
  • Operators need to know which tier is safe to shed. This is a product-organisation decision, not a database decision — engineers cannot decide unilaterally that notifications are best-effort without product agreement.
  • patterns/budget-enforced-quota-throttle (Pinterest Piqama) — Kubernetes-resource quotas with the same "over-budget users are throttled, within-budget is free" shape, at container-scheduler altitude rather than database altitude.
  • patterns/probabilistic-rejection-prioritization (Vitess throttler) — differential rejection by priority class at the rate-limiter tier.
  • patterns/deprioritize-all-except-target (Vitess throttler) — inverse of this pattern applied to a specific workflow (protect the target by holding everything else back).
  • CDN rate-limiting tiers — Cloudflare / Fastly offer per-zone rules that reject low-priority paths first under origin-backpressure signals.
  • Kubernetes priority classes + preemption — same structural shape at the pod-scheduler tier.

When it doesn't apply

  • All traffic is equally critical. A payment-processing system has no best-effort queries; every transaction is critical. The pattern still applies to internal tooling (reporting, analytics) but not to the core payment path.
  • No spare capacity even for critical tier. The pattern is a re-allocation lever, not a capacity creation lever. If critical alone exceeds capacity, sharding or scaling is the answer.
  • Classification is impossible or unreliable. If the application cannot tag queries (legacy code path, third- party library, stored procedures from a monolith), the pattern has nothing to enforce on.

Seen in

  • sources/2026-04-21-planetscale-graceful-degradation-in-postgres — Canonical wiki instance. Ben Dicken walks a social-media platform through the critical / important / best-effort partition and the 25% / 20% / unrestricted server-share allocation, demonstrating how a 10× viral-event spike survives with "temporary degradation of non-critical functionality" rather than "app becomes unusable."
Last updated · 347 distilled / 1,201 read