PlanetScale — Graceful degradation in Postgres¶
Summary¶
Ben Dicken's 2026-03-31 post reframes PlanetScale Traffic
Control (already canonicalised via the 2026-04-11 Keeping a
Postgres queue healthy post) from a mixed-workload
contention lens to a graceful-
degradation under load lens. The worked application is a
social-media platform whose traffic (authentication, post
fetching, likes, impressions, notifications, comments, trending,
DMs, etc.) is partitioned into three priority tiers — critical
(auth, post fetching, post creation, profile), important
(comments, search, DMs), best-effort (like/impression/bookmark
counts, trending, notifications, analytics) — and each tier
receives a distinct Traffic Control budget. Under a viral-event /
bad-deploy / DDoS load spike, the best-effort tier is the first
to be shed (either automatically via its low Server share + low
max concurrent workers, or manually via a live budget-disable)
while critical keeps working. Canonical PlanetScale framing:
stop serving non-critical components for a few minutes and
users barely notice; let everything equally contend and the app
becomes unusable and users leave.
The post also canonicalises the warn → enforce operational
lifecycle for Traffic Control budgets (ship in warn mode, watch
flagged-over-budget counts in Insights, tune limits, then switch
to enforce) and the [PGINSIGHTS] Traffic Control: warning
channel delivered in-band on Postgres query responses so
applications can observe budget pressure without user-facing
impact.
Key takeaways¶
-
Not all traffic is created equal — and the default database load-handling model treats it as if it were. Under normal load this doesn't matter; under spike load it does. "Every query has an equal shot at consuming CPU and I/O, which means a flood of impression-count queries can starve the ones that users care most about, like authenticating and loading their timeline."
-
Three-tier priority classification as a reusable template. Critical (app is broken without these: auth, post creation, post fetching, author profiles) / Important (noticeable if missing, app still usable: comments, post search, DMs — "oh hello 𝕏.com") / Best-effort (nice to have: like + impression + bookmark counts, trending topics, notifications, analytics). Canonical new concepts/query-priority-classification concept. "Your tiers will look different depending on your application. The point is to identify what you're willing to shed under pressure so that the things that matter most keep working."
-
SQLCommenter as the tagging substrate — a standard for appending key=value metadata as a SQL comment, parsed by Insights, used by Traffic Control for budget-selection. You pick the keys: this post uses
Budgets can then be set at either granularity — per-category (a dozen or so budgets, tuned independently) or per-priority (three coarse budgets). The post advocates the priority-level setup as the default starting point.category=viewPost, priority=critical(both a fine- grainedcategoryaxis and a coarsepriorityaxis on every query). -
Three priority-tier budget recipe (verbatim values from the post):
critical-budget— apply topriority='critical'queries. NoServer share, noBurst limit. Per- query max = 2 seconds (protects against rogue slow queries on critical path).important-budget— apply topriority='important'.Server share= 25% with moderatemax concurrent workers. "Plenty of room for comments and notifications under normal conditions, but some will be blocked when traffic is unexpectedly high." Start in warn mode, switch to enforce after tuning.-
best-effort-budget— apply topriority='bestEffort'.Server share= 20% + lowmax concurrent workers. "Under normal load, this budget provides more than enough resource share for these lightweight queries." Under spike, traffic can be dynamically reduced further or completely shut off right from the PlanetScale app. -
Warn mode → enforce mode is the budget-tuning operational lifecycle, not a one-shot configuration. Canonical new concepts/warn-mode-vs-enforce-mode concept. "There's no need to get the tunings above perfect from day one. You can start every budget in
warnmode. This will not kill any queries that exceed the budget. Rather, it will warn, and you can click into the budget to see how many queries are exceeding it over time." Only after the tuning stabilises does the budget flip to enforce. Canonical flow: comment → warn → monitor → enforce. -
In-band warning channel — over-budget events surface as
[PGINSIGHTS] Traffic Control:warnings returned directly in the query response from Postgres so applications can observe the impact "from within your application without any user-facing effects." Canonical wiki datum: a managed database can attach diagnostic metadata to query responses alongside the actual row data, using the extension layer as the piggyback channel. -
Live budget changes as the load-shedding lever. Under a spike, "we can click into the
best-effort-budgetand completely disable this traffic. Changes to budgets happen live, so we would immediately see the impact of this." Operators do not need to deploy an application change to shed load — the budget-config surface is the lever. Canonical new patterns/shed-low-priority-under-load pattern: when capacity is exhausted, cut the lowest-priority traffic class at the infrastructure layer (budget disable), not by application-code changes. -
"Temporary degradation of non-critical functionality" vs "total outage" — the architectural reframe. "What could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality. We've kept our users happy and avoided an application outage." The same mechanism that protects the MVCC horizon in a Postgres-queue workload (sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy) becomes the user-facing graceful-degradation lever when applied to priority-tiered user traffic. Two problems, one mechanism.
Systems extracted¶
- PlanetScale Traffic Control — framed here at the user-facing graceful-degradation altitude rather than the mixed-workload contention altitude. Same three dials (server share + burst, max concurrent workers, per-query limit) applied to a different problem class.
- PlanetScale Insights —
the over-budget observability surface. Budget violations
visible in-app;
[PGINSIGHTS]-channel warnings returned in query responses; warn-mode traffic counts surface over a 3-hour window. - PlanetScale Postgres — the substrate. Traffic Control is Postgres-exclusive on PlanetScale; upstream / AWS-RDS / GCP-SQL Postgres does not have this feature.
- Postgres — the engine under the
application. The
[PGINSIGHTS]warning mechanism is delivered inside the Postgres extension layer.
Concepts introduced¶
- concepts/query-priority-classification — the critical / important / best-effort three-tier scheme.
- concepts/warn-mode-vs-enforce-mode — two-state budget lifecycle (observe → tune → enforce). Warn counts violations without blocking; enforce actually rejects over-budget queries.
- concepts/sqlcommenter-query-tagging — canonical wiki page for the SQL-comment-as-metadata standard. Previously referenced implicitly across systems/planetscale-traffic-control, systems/planetscale-insights, concepts/query-tag-filter, concepts/actor-tagged-error — this post is the right place to canonicalise it as a standalone primitive.
- concepts/graceful-degradation — extended with database-tier instance. Prior wiki coverage was Netflix-centric (Simian Army framing); this post adds the "shed the lowest- priority traffic class via a live budget change" canonical instance at the database tier.
Patterns introduced¶
- patterns/shed-low-priority-under-load — the graceful- degradation-as-infrastructure pattern: classify traffic by user-perceived priority, apply per-class resource budgets that normally fit, cut the lowest-priority class at the budget layer under spike load. Sibling of the existing patterns/workload-class-resource-budget (same mechanism, different framing axis — the budget pattern is about coexistence; this pattern is about shedding).
- patterns/workload-class-resource-budget — extended with the user-facing-priority instance. Previously framed via the 2026-04-11 Postgres-queue post at the MVCC-horizon / mixed-workload axis; this post adds the three-tier user- priority canonical application.
Operational numbers¶
critical-budget: noServer share/Burst limit; per-query cap = 2 seconds.important-budget:Server share= 25%; moderatemax concurrent workers.best-effort-budget:Server share= 20%; lowmax concurrent workers; can be fully disabled live under spike.- Warn-mode data collection window illustrated in the post: 3-hour window, thousands of flagged-over-budget requests for a too-restrictive budget.
- Spike scenario modelled: 10× increase in authentications, posts, likes, notifications, impressions, page loads driven by a "crazy news story or celebrity drama."
Caveats¶
- Illustrative / pedagogical voice. No production-customer retrospective, no measured MTTR, no measured user-retention delta from Traffic Control under a real incident, no A/B comparison with a no-priority-tiers baseline.
- Threshold-picking is declared "not hard" but the post
doesn't give heuristics for the
Server sharesplit beyond the 25% / 20% example. The post's guidance is procedural (start in warn, tune down to where violations drop, flip to enforce) rather than numerical. - Classification burden is pushed to the application. Every query needs a SQLCommenter tag at the call-site; untagged queries fall into an unclassified default bucket. The operational cost of "tag every query path with the right priority" is not discussed.
- Retry responsibility lives with the caller. Traffic Control blocks over-budget queries and expects the application to retry — if the application doesn't retry, throttling degrades from "smooth" to "fail." The post doesn't elaborate on retry-storm avoidance (exponential backoff, jitter) under a spike where many requests are simultaneously blocked.
PGINSIGHTSwarning channel mechanics not fully specified. Whether the warning is aNoticeResponsePostgres-protocol message, a header, or an Insights-specific extension point is not disclosed; whether warnings are delivered on every over-budget query or sampled; whether a client library needs to know about the channel to observe them.- No quantification of the "temporary degradation" user- impact. The argument is structural ("users barely notice") rather than backed by user-engagement data for the worked shape.
- Only Postgres scope. MySQL-side Traffic Control (if it exists on PlanetScale Metal for MySQL) is not discussed; the post is explicitly Postgres-centric.
- Single-cluster scope. Multi-region / read-replica interactions with Traffic Control budgets are not discussed (do budgets apply per-instance or per-cluster? What happens during an automatic failover?).
Source¶
- Original: https://planetscale.com/blog/graceful-degradation-in-postgres
- Raw markdown:
raw/planetscale/2026-04-21-graceful-degradation-in-postgres-606b5077.md
Related¶
- companies/planetscale
- systems/planetscale-traffic-control
- systems/planetscale-insights
- systems/planetscale-for-postgres
- systems/postgresql
- concepts/graceful-degradation
- concepts/query-priority-classification
- concepts/warn-mode-vs-enforce-mode
- concepts/sqlcommenter-query-tagging
- patterns/shed-low-priority-under-load
- patterns/workload-class-resource-budget