CONCEPT Cited by 1 source
Oversampling metric interval¶
Rule of thumb¶
For a metric used as a threshold-based control-loop input, sample at 2–5× the rate of the threshold range you care about.
Worked example from Shlomi Noach (Anatomy of a Throttler, part 1):
"Borrowing from the world of networking hardware, it is recommended that metric interval and granularity oversample the range of allowed thresholds. For example, if the acceptable replication lag is at 5 seconds, then it's best to have a heartbeat/sampling interval of 1–2 seconds."
Where the rule comes from¶
Networking hardware borrows this from signal processing. The
Nyquist–Shannon sampling theorem
says: to faithfully reconstruct a signal of bandwidth B, sample
at 2B. In practice, engineers sample at 2.5–5× the signal
bandwidth to leave margin for anti-aliasing filters, clock jitter,
and quantisation noise.
For a control loop whose decision boundary is at threshold T, the
analogous statement: to not miss a threshold crossing, sample at a
rate high enough that the crossing is captured in at most one
sample window. If the signal can move by ~T per sample interval,
aliasing is likely; if it moves by ~T/2 or less per interval,
the threshold-crossing signal is faithfully captured.
The concrete recommendation for throttlers¶
For a database throttler with
threshold T:
| Threshold | Recommended sampling interval |
|---|---|
| 5 s replication lag | 1–2 s |
| 100 ms transaction commit delay | 20–50 ms |
| 90% pool usage | 1–2 s (if pool fill/drain is slow) |
The general pattern: sampling interval ≈ threshold / 2.5 to threshold / 5.
Why under-sampling is the common mistake¶
The obvious failure mode is sampling slower than the signal changes. Specific consequences:
- Miss the uptick. System degrades for the full sampling interval before throttler engages.
- Miss the recovery. Throttler blocks during the full sampling interval after metric clears.
- Release-thundering-herd. Multiple throttled jobs all see the clear at the same sample edge, push the metric back up in synchronised fashion, get blocked again.
All three are cured by tightening the sampling interval below the threshold range. None are cured by picking a different threshold.
Why over-sampling has a cost¶
The other direction isn't free either:
- More heartbeats → more writes on the primary; heartbeat stream becomes meaningful load on the changelog itself.
- More reads on replicas to capture heartbeats at higher rate.
- More storage of time-series samples.
Noach acknowledges this in passing and defers it to a later post: "that, too, comes at a cost, which we will discuss in a later post."
Distinct from time-series downsampling¶
This concept is about control-loop input sampling, not about time-series visualisation downsampling. The latter (showing a 1-s metric as 5-min averages in Grafana) is about reducing display density, not about the control loop's decision latency. The two can and should be set independently — you can sample at 1 s for the throttler and display at 5 min for the dashboard.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki introduction. Noach borrows the rule from networking hardware and applies it explicitly to heartbeat / sampling interval design for replication-lag-based throttlers. Key quote: "metric interval and granularity oversample the range of allowed thresholds."
Related¶
- concepts/metric-sampling-interval — parent concept; covers the stale-sample / jitter / release-thundering-herd problems that oversampling addresses.
- concepts/database-throttler — the use case.
- patterns/heartbeat-based-replication-lag-measurement — the mechanism whose interval parameter this rule constrains.
- concepts/replication-lag — the canonical metric where the oversampling rule is applied in practice.