CONCEPT Cited by 1 source
Throttler threshold as standard¶
Observation¶
When a system runs a workload aggressive enough to push against a throttler's threshold for extended periods, the threshold itself becomes the steady-state metric value. The metric graph looks like:
Not flat below the threshold (that would only happen if the workload can't keep up with the system). Not above the threshold (the throttler is actively keeping the metric at or near the line).
"As we start the operation, we expect to see the replication- lag graph jump up to the threshold value, and then more or less stabilize around that value, slightly higher and slightly lower, for the duration of the operation, which could be hours."
"No matter how many more concurrent operations we run, we expect to contain replication lag at about the same slight offset above or below the threshold."
— Shlomi Noach, Anatomy of a Throttler, part 1
What this looks like in production¶
During the operation:
- The metric crosses the threshold upward → throttler starts rejecting.
- Workload backs off → metric drops.
- Throttler starts accepting again → workload pushes up.
- Repeat thousands or millions of times over the operation duration.
"The operation will be granted access thousands of times or more, and will likewise also be rejected access thousands of times or more. That is how a healthy system looks with a throttler engaged."
The reframing: threshold is the new SLO¶
Noach's key observation:
"It is not uncommon for a system to run one or two operations for very long periods, which means what we consider as the throttling threshold (say, a 5 sec replication lag) becomes the actual standard."
Implication for operators:
- The throttler threshold is not an upper bound on metric excursions — it is the expected steady-state value during pushing workloads.
- Whatever this value degrades for downstream consumers (failover promotion time, read-your-writes freshness, alerts keyed to "metric > threshold") is the cost the operator signed up for by setting the threshold.
Design implications¶
-
Set the threshold against what you're willing to accept as steady state, not against what you're willing to tolerate as a rare excursion. If a 5-s replication-lag threshold is fine for short bursts but ruinous as a persistent condition, set it lower.
-
Beware thresholds that are "fine most of the time". A threshold set assuming the metric rarely reaches it will surface as a behaviour change the first time a large workload pushes against it — and will stay there.
-
Consider tiered thresholds. Soft threshold (start slowing) below hard threshold (stop entirely) gives a control-loop shape that gently pushes the workload away from the ceiling rather than pinning it there.
-
Consider workload-specific thresholds. A shared threshold for all jobs means one aggressive job can pin the metric at the ceiling, penalising every other user of the replica even when they're doing nothing wrong. See patterns/workload-class-resource-budget for per-class alternatives.
Soft landing¶
Not every workload is aggressive enough to push against the threshold:
"Thankfully, not all operations and workloads are so aggressive that they necessarily push the metrics as high as their thresholds."
A small import job on a well-provisioned cluster may see the metric stay well below the threshold — the throttler never engages and the workload runs at full speed. The threshold-as-standard observation applies specifically to threshold-bound workloads; for unbound ones the threshold is a safety net, not a ceiling.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki framing. Noach names this dynamic explicitly in the section following his walk-through of replication lag as a throttling signal and uses it to motivate the discussion of sampling interval and oversampling in the next section.
Related¶
- concepts/database-throttler — the system property this behaviour emerges from.
- concepts/replication-lag — canonical worked example.
- concepts/metric-sampling-interval — the lower the sampling interval, the tighter the steady-state oscillation around the threshold.
- concepts/oversampling-metric-interval — the design response.
- concepts/symptom-vs-cause-metric — the threshold is a symptom threshold, not a root-cause threshold; the dynamic preserves that.