SYSTEM Cited by 2 sources
PlanetScale Traffic Control¶
What it is¶
Traffic Control is a feature of the PlanetScale Insights extension that enforces per-workload-class resource budgets on PlanetScale Postgres clusters. It is the canonical wiki instance of the patterns/workload-class-resource-budget pattern.
Queries are classified by SQLCommenter metadata tags appended to
the SQL (e.g. /* action=analytics */). A Resource Budget
defines limits for that class; queries exceeding the budget are
blocked and expected to be retried by the caller.
Three budget dials¶
| Dial | Controls |
|---|---|
| Server share + burst limit | Percentage of server resources + how quickly they can be consumed |
| Per-query limit | Seconds of full-server usage a single query may consume |
| Maximum concurrent workers | Percentage of max_worker_processes available to this class at any instant |
The third dial is the load-bearing one for protecting the MVCC horizon: capping a low-priority class to 1 concurrent worker opens windows where autovacuum can actually run.
What problem it solves¶
Upstream Postgres offers statement_timeout (7.3+),
idle_in_transaction_session_timeout (9.6+), and
transaction_timeout (17.0+) — all of which target individual-query
duration and cannot limit workload-class concurrency. Three
continuously-overlapping 40-second analytics queries keep the MVCC
horizon pinned without any individual query tripping a timeout;
autovacuum sees a continuously-pinned horizon and can't reclaim
dead tuples produced by other
workloads (e.g. a queue table on
the same cluster).
Traffic Control is the class-of-mechanism upstream Postgres doesn't have: limit how many queries of a class can be active at once, not how long any one query runs.
Measured effect¶
In PlanetScale's stress test (Source: ):
- Workload: 800 jobs/sec producer + 3 concurrent
action=analytics120-second queries + 8 workers + 10 ms work time. 15-minute run on a PlanetScale cluster. - Traffic Control disabled: 155,000-job backlog, 300+ ms lock time, 383,000 dead tuples at end, VACUUM blocked. Death spiral.
- Traffic Control enabled (analytics cap = 1 concurrent worker,
25% of
max_worker_processes): 0 jobs backlog, 2 ms lock time, dead tuples cycling 0–23,000, VACUUM runs normally in the gaps, 15 analytics queries completed in 15 min. Completely stable.
The analytics reports still run to completion — just serialized instead of 3-way concurrent.
Caller requirements¶
Applications must implement retry logic for blocked queries. Traffic Control is not doing less work, it is smoothing the rate at which work is performed. Without caller retry, throttling converts "database dies" into "queries fail."
Availability¶
Exclusive to PlanetScale Postgres clusters — not available in upstream Postgres, managed AWS/GCP/Azure Postgres services, or self-hosted Postgres. Upstream-compatible approximations (pgbouncer pool-mode caps + application-side rate limiting) cover some of the ground but don't recognize SQLCommenter tags natively.
Seen in¶
-
— Canonical implementation deep-dive from Patrick Reynolds (PlanetScale, 2026-03-23) pairing with the Dicken graceful- degradation and Brown patterns posts. Canonicalises five internals mechanisms: (1) hook-colocation inside the existing
pginsightsextension on theExecutorRun/ProcessUtilityhooks — "Traffic Control needed the same hook points and much of the same information thatpginsightsalready had. So rather than duplicate all that code and impose the extra runtime overhead of another extension, we taughtpginsightshow to block queries" (patterns/hook-colocation-for-zero-overhead); the original plan had Traffic Control as a separate extension withpg_strictcarrying the static-analysis half, but the dynamic / cumulative half merged intopginsights. (2) Four-check admission sequence atExecutorRun: rule match → concurrency check → per-query cost check → cumulative cost check, all inside a microsecond budget. (3) Pre-execution blocking beatsstatement_timeout/ cgroups — "Blocking a query just before it begins execution means the server spends no resources on the query, beyond the cost of the planner and the decision to block it. That's an improvement over schedulers like Linux cgroups, which let every task begin and simply starve them of resources if higher priority tasks exist in the system. It's also an improvement over the Postgresstatement_timeoutsetting, which allows any overly expensive query to consume resources until it times out." (4) Plan cost ×kas the wall-clock estimator — canonical new per-(query-pattern × host)kconstant computed as the ratio of twopginsightsEMAs (observed CPU time / observed planner cost), updated on every query completion. Canonical pattern patterns/plan-cost-times-k-estimator: use the database's own planner as a free pre-execution cost oracle, then calibrate via per-pattern EMA. (5) Reverse leaky bucket per budget — debt accumulates (instead of credits draining), empty buckets evictable from shared memory; "we inverted the model for a simple reason: an empty bucket doesn't need to be stored." Composed with lazy drain-on-read and evict-on-insert — no sweeper thread. (6) Rule-set O(1) rule matching:<key, value>→ rule hash map gives O(query-metadata-fields) candidate identification regardless of rule count, with three operator-visible exceptions (CIDRremote_addressrules cost O(mask-length-count); conjunction rules cost O(overlapping-<k,v>); duplicate-key rules cost O(duplicates)). Three-phase dispatch: lookup → filter → budget-check. (7)postgresql.confas control plane for the distribution channel: UI / API edit →planetscaleDB rows → JSON-serialise intotraffic_control.rules/traffic_control.budgetsparameters → push to each replica'spostgresql.conffile → Postgres reloads parameters (no restart required) → each worker rebuilds its rule set between queries. 1–2-second end-to-end propagation latency. Chosen specifically for overload robustness: "You may want new Traffic Control rules most urgently when Postgres is using 100% of its available CPU, 100% of its worker processes, or both. Changing config files is possible even when opening a new SQL connection and issuing statements wouldn't be." -
— Canonical Go-language implementation companion framing Traffic Control as a composition of five orthogonal tagging axes rather than a single mechanism. Josh Brown (2026-04-02) canonicalises the third framing (after Griggs's mixed-workload contention and Dicken's user-perceived- priority framings) with concrete Go patterns: (1) service- isolation via Postgres
username+application_nameconnection-string parameter; (2) route-isolation vianet/httpmiddleware injectingroute=<r.Pattern>(with Go 1.22+{/}→:normalisation) — patterns/route-tagged-query-isolation; (3) deployment / canary tag via startupDEPLOYMENT_TAGenv-var or runtime feature-flag evaluation; (4) SaaS tier-isolation via auth-middleware injection (tier='FREE'/'PRO'/'ENTERPRISE') — patterns/tier-tagged-query-isolation; (5) background-job / script isolation via dedicated connection pool with distinctapplication_name= background-jobsorapplication_name=script-<name>— patterns/dedicated-application-name-per-workload. All five axes compose AND-wise on one tagged query — canonical new concepts/composable-tag-axes: "Multiple matching budgets all apply simultaneously and queries must satisfy every budget they match. You can build layered policies without complicated rule logic." Also canonicalises the two operational-integration primitives: (a) Enforce- mode [[concepts/sqlstate-53000-traffic-control-error|SQLSTATE53000with[PGINSIGHTS] Traffic Control:prefix]] detected viaerrors.Ason*pgconn.PgError, with role- dependent caller response (analytics →503; critical path → exponential-backoff retry 100ms → 200ms → 400ms); (b) Warn-modepgx/v5OnNoticehandler catchingpgconn.Noticeframes in-band alongside query results for observability without user-facing impact. The Go-idiomatic substrate underlying all five patterns is patterns/context-threaded-sql-tag-propagation — a framework-lesscontext.Context-threaded counterpart to the Rails ORM-middleware pattern canonicalised via the 2022 Coutermarsh + Ekechukwu post — with a two-helper substrate (appendTags(query, tags)deterministic + URL-encoding +SQLCommenter-format render;tagsFromContext(ctx)copy- on-read for thread-safety) + a wrapperQueryContext/ExecContextmethod on the database handle that renders tags at the last moment. Canonical final framing: "The difference between a database outage and a degraded experience often comes down to whether you've decided in advance which traffic to shed. Traffic Control makes that decision explicit and configurable instead of leaving it to whichever query happens to win a resource race." -
sources/2026-04-21-planetscale-graceful-degradation-in-postgres — Canonical user-facing graceful- degradation framing of Traffic Control. Ben Dicken reframes the same mechanism from the mixed-workload contention lens (2026-04-11 queue-health post) to the survive-a-spike-with- partial-degradation lens. Canonical three-tier budget recipe: critical (no server-share cap, per-query max = 2 sec), important (25% server share, moderate concurrent workers), best-effort (20% server share, low concurrent workers, live-disable-able under spike). Canonical new query priority classification (critical / important / best-effort) and shed-low-priority-under- load pattern. Also canonicalises two operational primitives that were previously undisclosed: (1) the warn → monitor → enforce budget-tuning lifecycle ("There's no need to get the tunings above perfect from day one. You can start every budget in warn mode… click into the budget to see how many queries are exceeding it over time") — threshold-learning before commitment; (2) the
[PGINSIGHTS] Traffic Control:in-band warning channel returned inside the Postgres query response so applications can observe budget pressure "from within your application without any user-facing effects" — warn-mode observability survives alongside enforce-mode blocking via a diagnostic piggyback on the query wire protocol. Worked spike: a 10× viral-event / bad-deploy / DDoS load spike is survived by clicking into the best- effort budget and completely disabling it live — "what could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality." Two framings, one mechanism. -
— Canonical introduction and measured efficacy demonstration. Griggs frames Traffic Control as the structural fix for the "patterns/postgres-queue-on-same-database in a mixed-workload cluster" failure mode that autovacuum tuning and statement/transaction timeouts cannot address.
- — Co-launched with Traffic Control on the same release. The
Insights enhanced-tagging release extends SQLCommenter tag
visibility from the notable-query stream to the aggregate
stream, which lets operators see the aggregate impact of
their Traffic Control classes directly in Insights:
before this release, one could see the cap (e.g.
"
action=analyticslimited to 1 worker") but couldn't easily get the aggregate statistics broken down by that tag; after this release, the Tags sidebar section surfaces "database- level aggregate statistics broken down by tag." Makes Traffic Control's classification substrate legible end-to-end in one UI: SQLCommenter tag → class rule → aggregate impact.
Source¶
Related¶
- systems/planetscale-insights
- systems/planetscale-for-postgres
- systems/planetscale
- systems/postgresql
- concepts/mvcc-horizon
- concepts/postgres-autovacuum
- concepts/postgres-queue-table
- concepts/aggregate-tag-attribution
- concepts/graceful-degradation
- concepts/query-priority-classification
- concepts/warn-mode-vs-enforce-mode
- concepts/sqlcommenter-query-tagging
- concepts/context-propagated-sql-tags
- concepts/sqlstate-53000-traffic-control-error
- concepts/postgres-notice-warning-channel
- concepts/composable-tag-axes
- concepts/postgres-hook
- concepts/reverse-leaky-bucket
- concepts/plan-cost-to-wallclock-constant-k
- concepts/lazy-leaky-bucket-update
- concepts/rule-set-o1-rule-matching
- concepts/postgresql-conf-as-control-plane
- patterns/workload-class-resource-budget
- patterns/shed-low-priority-under-load
- patterns/context-threaded-sql-tag-propagation
- patterns/route-tagged-query-isolation
- patterns/tier-tagged-query-isolation
- patterns/dedicated-application-name-per-workload
- patterns/dynamic-cardinality-reduction-by-tag-collapse
- patterns/hook-colocation-for-zero-overhead
- patterns/plan-cost-times-k-estimator
- companies/planetscale