Skip to content

SYSTEM Cited by 2 sources

PlanetScale Traffic Control

What it is

Traffic Control is a feature of the PlanetScale Insights extension that enforces per-workload-class resource budgets on PlanetScale Postgres clusters. It is the canonical wiki instance of the patterns/workload-class-resource-budget pattern.

Queries are classified by SQLCommenter metadata tags appended to the SQL (e.g. /* action=analytics */). A Resource Budget defines limits for that class; queries exceeding the budget are blocked and expected to be retried by the caller.

Three budget dials

Dial Controls
Server share + burst limit Percentage of server resources + how quickly they can be consumed
Per-query limit Seconds of full-server usage a single query may consume
Maximum concurrent workers Percentage of max_worker_processes available to this class at any instant

The third dial is the load-bearing one for protecting the MVCC horizon: capping a low-priority class to 1 concurrent worker opens windows where autovacuum can actually run.

What problem it solves

Upstream Postgres offers statement_timeout (7.3+), idle_in_transaction_session_timeout (9.6+), and transaction_timeout (17.0+) — all of which target individual-query duration and cannot limit workload-class concurrency. Three continuously-overlapping 40-second analytics queries keep the MVCC horizon pinned without any individual query tripping a timeout; autovacuum sees a continuously-pinned horizon and can't reclaim dead tuples produced by other workloads (e.g. a queue table on the same cluster).

Traffic Control is the class-of-mechanism upstream Postgres doesn't have: limit how many queries of a class can be active at once, not how long any one query runs.

Measured effect

In PlanetScale's stress test (Source: ):

  • Workload: 800 jobs/sec producer + 3 concurrent action=analytics 120-second queries + 8 workers + 10 ms work time. 15-minute run on a PlanetScale cluster.
  • Traffic Control disabled: 155,000-job backlog, 300+ ms lock time, 383,000 dead tuples at end, VACUUM blocked. Death spiral.
  • Traffic Control enabled (analytics cap = 1 concurrent worker, 25% of max_worker_processes): 0 jobs backlog, 2 ms lock time, dead tuples cycling 0–23,000, VACUUM runs normally in the gaps, 15 analytics queries completed in 15 min. Completely stable.

The analytics reports still run to completion — just serialized instead of 3-way concurrent.

Caller requirements

Applications must implement retry logic for blocked queries. Traffic Control is not doing less work, it is smoothing the rate at which work is performed. Without caller retry, throttling converts "database dies" into "queries fail."

Availability

Exclusive to PlanetScale Postgres clusters — not available in upstream Postgres, managed AWS/GCP/Azure Postgres services, or self-hosted Postgres. Upstream-compatible approximations (pgbouncer pool-mode caps + application-side rate limiting) cover some of the ground but don't recognize SQLCommenter tags natively.

Seen in

  • Canonical implementation deep-dive from Patrick Reynolds (PlanetScale, 2026-03-23) pairing with the Dicken graceful- degradation and Brown patterns posts. Canonicalises five internals mechanisms: (1) hook-colocation inside the existing pginsights extension on the ExecutorRun / ProcessUtility hooks"Traffic Control needed the same hook points and much of the same information that pginsights already had. So rather than duplicate all that code and impose the extra runtime overhead of another extension, we taught pginsights how to block queries" (patterns/hook-colocation-for-zero-overhead); the original plan had Traffic Control as a separate extension with pg_strict carrying the static-analysis half, but the dynamic / cumulative half merged into pginsights. (2) Four-check admission sequence at ExecutorRun: rule match → concurrency check → per-query cost check → cumulative cost check, all inside a microsecond budget. (3) Pre-execution blocking beats statement_timeout / cgroups"Blocking a query just before it begins execution means the server spends no resources on the query, beyond the cost of the planner and the decision to block it. That's an improvement over schedulers like Linux cgroups, which let every task begin and simply starve them of resources if higher priority tasks exist in the system. It's also an improvement over the Postgres statement_timeout setting, which allows any overly expensive query to consume resources until it times out." (4) Plan cost × k as the wall-clock estimator — canonical new per-(query-pattern × host) k constant computed as the ratio of two pginsights EMAs (observed CPU time / observed planner cost), updated on every query completion. Canonical pattern patterns/plan-cost-times-k-estimator: use the database's own planner as a free pre-execution cost oracle, then calibrate via per-pattern EMA. (5) Reverse leaky bucket per budget — debt accumulates (instead of credits draining), empty buckets evictable from shared memory; "we inverted the model for a simple reason: an empty bucket doesn't need to be stored." Composed with lazy drain-on-read and evict-on-insert — no sweeper thread. (6) Rule-set O(1) rule matching: <key, value> → rule hash map gives O(query-metadata-fields) candidate identification regardless of rule count, with three operator-visible exceptions (CIDR remote_address rules cost O(mask-length-count); conjunction rules cost O(overlapping- <k,v>); duplicate-key rules cost O(duplicates)). Three-phase dispatch: lookup → filter → budget-check. (7) postgresql.conf as control plane for the distribution channel: UI / API edit → planetscale DB rows → JSON-serialise into traffic_control.rules / traffic_control.budgets parameters → push to each replica's postgresql.conf file → Postgres reloads parameters (no restart required) → each worker rebuilds its rule set between queries. 1–2-second end-to-end propagation latency. Chosen specifically for overload robustness: "You may want new Traffic Control rules most urgently when Postgres is using 100% of its available CPU, 100% of its worker processes, or both. Changing config files is possible even when opening a new SQL connection and issuing statements wouldn't be."

  • Canonical Go-language implementation companion framing Traffic Control as a composition of five orthogonal tagging axes rather than a single mechanism. Josh Brown (2026-04-02) canonicalises the third framing (after Griggs's mixed-workload contention and Dicken's user-perceived- priority framings) with concrete Go patterns: (1) service- isolation via Postgres username + application_name connection-string parameter; (2) route-isolation via net/http middleware injecting route=<r.Pattern> (with Go 1.22+ {/}→: normalisation) — patterns/route-tagged-query-isolation; (3) deployment / canary tag via startup DEPLOYMENT_TAG env-var or runtime feature-flag evaluation; (4) SaaS tier-isolation via auth-middleware injection (tier='FREE' / 'PRO' / 'ENTERPRISE') — patterns/tier-tagged-query-isolation; (5) background-job / script isolation via dedicated connection pool with distinct application_name= background-jobs or application_name=script-<name>patterns/dedicated-application-name-per-workload. All five axes compose AND-wise on one tagged query — canonical new concepts/composable-tag-axes: "Multiple matching budgets all apply simultaneously and queries must satisfy every budget they match. You can build layered policies without complicated rule logic." Also canonicalises the two operational-integration primitives: (a) Enforce- mode [[concepts/sqlstate-53000-traffic-control-error|SQLSTATE 53000 with [PGINSIGHTS] Traffic Control: prefix]] detected via errors.As on *pgconn.PgError, with role- dependent caller response (analytics → 503; critical path → exponential-backoff retry 100ms → 200ms → 400ms); (b) Warn-mode pgx/v5 OnNotice handler catching pgconn.Notice frames in-band alongside query results for observability without user-facing impact. The Go-idiomatic substrate underlying all five patterns is patterns/context-threaded-sql-tag-propagation — a framework-less context.Context-threaded counterpart to the Rails ORM-middleware pattern canonicalised via the 2022 Coutermarsh + Ekechukwu post — with a two-helper substrate (appendTags(query, tags) deterministic + URL-encoding + SQLCommenter-format render; tagsFromContext(ctx) copy- on-read for thread-safety) + a wrapper QueryContext / ExecContext method on the database handle that renders tags at the last moment. Canonical final framing: "The difference between a database outage and a degraded experience often comes down to whether you've decided in advance which traffic to shed. Traffic Control makes that decision explicit and configurable instead of leaving it to whichever query happens to win a resource race."

  • sources/2026-04-21-planetscale-graceful-degradation-in-postgresCanonical user-facing graceful- degradation framing of Traffic Control. Ben Dicken reframes the same mechanism from the mixed-workload contention lens (2026-04-11 queue-health post) to the survive-a-spike-with- partial-degradation lens. Canonical three-tier budget recipe: critical (no server-share cap, per-query max = 2 sec), important (25% server share, moderate concurrent workers), best-effort (20% server share, low concurrent workers, live-disable-able under spike). Canonical new query priority classification (critical / important / best-effort) and shed-low-priority-under- load pattern. Also canonicalises two operational primitives that were previously undisclosed: (1) the warn → monitor → enforce budget-tuning lifecycle ("There's no need to get the tunings above perfect from day one. You can start every budget in warn mode… click into the budget to see how many queries are exceeding it over time") — threshold-learning before commitment; (2) the [PGINSIGHTS] Traffic Control: in-band warning channel returned inside the Postgres query response so applications can observe budget pressure "from within your application without any user-facing effects" — warn-mode observability survives alongside enforce-mode blocking via a diagnostic piggyback on the query wire protocol. Worked spike: a 10× viral-event / bad-deploy / DDoS load spike is survived by clicking into the best- effort budget and completely disabling it live"what could have been a huge lost-opportunity (your app becomes unusable) is now only a temporary degradation of non-critical functionality." Two framings, one mechanism.

  • — Canonical introduction and measured efficacy demonstration. Griggs frames Traffic Control as the structural fix for the "patterns/postgres-queue-on-same-database in a mixed-workload cluster" failure mode that autovacuum tuning and statement/transaction timeouts cannot address.

  • — Co-launched with Traffic Control on the same release. The Insights enhanced-tagging release extends SQLCommenter tag visibility from the notable-query stream to the aggregate stream, which lets operators see the aggregate impact of their Traffic Control classes directly in Insights: before this release, one could see the cap (e.g. "action=analytics limited to 1 worker") but couldn't easily get the aggregate statistics broken down by that tag; after this release, the Tags sidebar section surfaces "database- level aggregate statistics broken down by tag." Makes Traffic Control's classification substrate legible end-to-end in one UI: SQLCommenter tag → class rule → aggregate impact.

Source

Last updated · 542 distilled / 1,571 read