PATTERN Cited by 1 source
Workload-class resource budget¶
Problem¶
A database cluster runs multiple workload classes concurrently: high-priority transactional traffic, medium-priority user-facing analytics, low-priority scheduled reports, job-queue workers. The classes compete for the same resources (CPU, I/O, backends, worker processes) but have different priorities and different blast radii if they fall behind.
Per-query timeouts (statement_timeout, transaction_timeout) can
kill runaway individuals but cannot limit how many queries from a
class can run concurrently. Continuously overlapping short queries
from a low-priority class defeat all timeout-based controls — no
individual query hits a duration limit, yet the class as a whole
keeps the MVCC horizon pinned, starves
autovacuum, or monopolizes I/O.
Autovacuum tuning can't help either — it has nothing it can legally reclaim while the horizon is pinned.
Solution¶
Tag queries with a workload class (typically via query-string metadata or a session variable) and apply a resource budget to each class with at least one of three dials:
- Maximum concurrent workers / backends — percentage of
max_worker_processesthat class can consume at any instant. Primary lever for protecting the MVCC horizon: capaction=analyticsto e.g. 1 concurrent worker and the horizon opens between queries. - Server share + burst limit — fraction of CPU/I/O the class may consume, with a burst ceiling. Smooths spikes from a class that occasionally runs heavy queries.
- Per-query limit — seconds of full-server resource use per individual query, independent of wall-clock time.
Queries that exceed a class's budget are blocked and the caller is expected to retry later, not killed outright. The total work happens, just at a rate the higher-priority class can coexist with.
Callers must implement retry logic — throttling without retry just moves the failure from "database dies" to "queries fail."
Tagging mechanism¶
Typical implementation: each class tags queries with SQLCommenter metadata appended to the SQL — e.g.
The resource-budget engine parses the comment, maps action=analytics
to a budget definition, and enforces the dials. Non-tagged queries
default to the "unclassified" bucket (typically unconstrained).
Canonical instance: PlanetScale Traffic Control¶
systems/planetscale-traffic-control is the canonical wiki instance — part of the Insights extension on PlanetScale Postgres. All three dials (server share + burst, per-query limit, max concurrent workers) are configurable per-budget. SQLCommenter tags select the budget.
Measured outcome in Griggs's 800-jobs/sec stress test with 3 concurrent overlapping 120-sec analytics queries (Source: sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy):
| Metric | Disabled | Enabled (analytics cap = 1) |
|---|---|---|
| Queue backlog | 155,000 jobs | 0 jobs |
| Lock time | 300+ ms | 2 ms |
| Dead tuples at end | 383,000 | 0–23,000 cycling |
| Analytics queries | 3 concurrent, overlapping | 1 at a time, 2 retrying |
| VACUUM effectiveness | Blocked (horizon pinned) | Normal (windows between queries) |
| Outcome | Death spiral | Completely stable |
Related mechanisms in other contexts¶
- patterns/budget-enforced-quota-throttle (Pinterest Piqama) applies the same idea at Kubernetes-resource layer: quota-managed consumers must stay within an assigned budget or throttle.
- AWS RDS Performance Insights + AWS Proactive Insights surface the telemetry but don't enforce class-level concurrency caps.
- Postgres
ALTER USER ... CONNECTION LIMITis the coarsest version at the role granularity. - pgbouncer pool-mode
pool_sizeper database/user can approximate concurrency caps at the proxy tier.
Each is a narrower version of the general shape: tag work by class, budget resources per class, retry-block excess.
When it doesn't apply¶
- Single-workload clusters — no classes to distinguish.
- Low-churn databases — MVCC horizon is a non-issue so the primary motivator is absent.
- Workloads where every query is equal-priority (all-or-nothing throughput is the objective).
Seen in¶
- sources/2026-04-11-planetscale-keeping-a-postgres-queue-healthy — Canonical wiki instance. Griggs argues that autovacuum tuning and Postgres-upstream timeouts are structurally insufficient for mixed-workload clusters because they target duration, not concurrency. Traffic Control is the pattern-instantiating mechanism, measured to convert a death-spiral to complete stability at 800 jobs/sec + overlapping analytics.