CONCEPT Cited by 1 source

Throttler client starvation¶

Definition¶

Throttler client starvation is the failure mode where one client's workload pushes the throttling metric continuously above threshold for an extended period, causing the throttler to continuously reject every other client throughout that period. All well-behaved clients get zero progress while the offending client monopolises the database.

"Let's assume the client's workload is such that it exhausts resources and causes replication lag to spike to the scale of many minutes, well beyond the throttler's threshold. Nothing pushes back this client, and it continues to hammer the database for hours. During that time, requests from all other clients are continuously rejected. This is a starvation scenario."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-3

Causes¶

Starvation requires one client to remain unthrottled while others are throttled — i.e. the system must have a mechanism that makes some client exempt from the metric that others respect. Three structural causes:

Rogue / malfunctioning client. A client that collaboratively refuses to consult the throttler. No exemption is configured — the client just doesn't ask. The cooperative-model structural hole.
Explicit exemption. An operator configured the throttler to grant a specific client a free pass. The exemption works as designed — and is precisely what causes starvation of everyone else.
Differential metric assignment. One client throttles on metric A, another on metric A+B. If client-1's workload spikes metric B, client-2 is throttled; client-1 sails through. Functionally identical to partial exemption.

"While the second client throttles based on load average, the first client is effectively exempted from checking load average. If that first client's workload is such that it does indeed push load average beyond its threshold, then the second client becomes starved. It never gets a chance to operate."

How it differs from backoff¶

Normal throttler operation produces brief, rotating rejections as the metric oscillates just above and below threshold — every client waits a moment and retries (see concepts/throttler-threshold-as-standard). Starvation is sustained, one-sided rejection — the metric is pinned far above threshold by one client, and all other clients are in continuous back-off.

Signature:

Per-client rejection rate for victim clients → 100% for minutes to hours.
Per-client rejection rate for offender client → 0% (not asking) or low (exempted).
System metric flat far above threshold rather than oscillating around it.

Tolerable vs intolerable starvation¶

Not all starvation is an incident. Noach offers a qualitative waterline:

"If a client is starved for 10 minutes out of a total runtime of 12 hours, this may not be a big deal."

Thresholds depend on the workload — a 12-hour batch job losing 10 minutes is fine; a 30-second online DDL losing 10 minutes is not.

Some starvation is intentional:

Incident-response fixes that must run regardless.
Essential system components that the rest of the pipeline depends on.
Very short intervals of one workload's dominance.

See concepts/throttler-exemption for the three named cases where exemption — and therefore deliberate starvation of others — is acceptable.

Mitigations¶

patterns/probabilistic-rejection-prioritization instead of exemption. The favoured client gets a lower rejection ratio, but still respects the metric-based admission decision. Starvation shrinks to the metric- oscillation window.
patterns/deprioritize-all-except-target as the favoured-client-friendly framing of the same mechanism.
patterns/time-bounded-throttler-rule — exemptions and prioritisations auto-expire, so stale rules can't silently produce persistent starvation.
patterns/enforcement-throttler-proxy — closes the rogue-client hole; even non-participating clients are subject to query-level delay.
Per-client identity + per-client rejection-rate alarming — detect starvation quickly via client-identity keyed metrics.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-3 — canonical wiki introduction. Shlomi Noach uses the replication-lag-pinned-at-minutes worked example to name starvation as the central risk of exemption-based prioritisation and the reason dice-roll ratio-based prioritisation is preferable.