Skip to content

CONCEPT Cited by 1 source

Oversampling metric interval

Rule of thumb

For a metric used as a threshold-based control-loop input, sample at 2–5× the rate of the threshold range you care about.

Worked example from Shlomi Noach (Anatomy of a Throttler, part 1):

"Borrowing from the world of networking hardware, it is recommended that metric interval and granularity oversample the range of allowed thresholds. For example, if the acceptable replication lag is at 5 seconds, then it's best to have a heartbeat/sampling interval of 1–2 seconds."

Where the rule comes from

Networking hardware borrows this from signal processing. The Nyquist–Shannon sampling theorem says: to faithfully reconstruct a signal of bandwidth B, sample at 2B. In practice, engineers sample at 2.5–5× the signal bandwidth to leave margin for anti-aliasing filters, clock jitter, and quantisation noise.

For a control loop whose decision boundary is at threshold T, the analogous statement: to not miss a threshold crossing, sample at a rate high enough that the crossing is captured in at most one sample window. If the signal can move by ~T per sample interval, aliasing is likely; if it moves by ~T/2 or less per interval, the threshold-crossing signal is faithfully captured.

The concrete recommendation for throttlers

For a database throttler with threshold T:

Threshold Recommended sampling interval
5 s replication lag 1–2 s
100 ms transaction commit delay 20–50 ms
90% pool usage 1–2 s (if pool fill/drain is slow)

The general pattern: sampling interval ≈ threshold / 2.5 to threshold / 5.

Why under-sampling is the common mistake

The obvious failure mode is sampling slower than the signal changes. Specific consequences:

  1. Miss the uptick. System degrades for the full sampling interval before throttler engages.
  2. Miss the recovery. Throttler blocks during the full sampling interval after metric clears.
  3. Release-thundering-herd. Multiple throttled jobs all see the clear at the same sample edge, push the metric back up in synchronised fashion, get blocked again.

All three are cured by tightening the sampling interval below the threshold range. None are cured by picking a different threshold.

Why over-sampling has a cost

The other direction isn't free either:

  • More heartbeats → more writes on the primary; heartbeat stream becomes meaningful load on the changelog itself.
  • More reads on replicas to capture heartbeats at higher rate.
  • More storage of time-series samples.

Noach acknowledges this in passing and defers it to a later post: "that, too, comes at a cost, which we will discuss in a later post."

Distinct from time-series downsampling

This concept is about control-loop input sampling, not about time-series visualisation downsampling. The latter (showing a 1-s metric as 5-min averages in Grafana) is about reducing display density, not about the control loop's decision latency. The two can and should be set independently — you can sample at 1 s for the throttler and display at 5 min for the dashboard.

Seen in

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki introduction. Noach borrows the rule from networking hardware and applies it explicitly to heartbeat / sampling interval design for replication-lag-based throttlers. Key quote: "metric interval and granularity oversample the range of allowed thresholds."
Last updated · 319 distilled / 1,201 read