CONCEPT Cited by 1 source
Metric sampling interval¶
Definition¶
Metric sampling interval is the period between successive measurements of a metric. A metric captured once per second has a sampling interval of 1 s; a metric captured on demand has a sampling interval bounded by the query frequency.
For signals used in control loops (throttlers, autoscalers, alert evaluators), the sampling interval is a parameter of the control loop, not just of the observability system. Too-long intervals cause stale decisions; too-short intervals cost CPU / network / storage.
The stale-sample problem¶
If the throttler's decision is based on a sample that is T
seconds old, the throttler is controlling the system as it was
T seconds ago — not as it is now. With long intervals:
- Miss the uptick. Load spike happens between samples;
system degrades for ~
Tseconds before throttler engages. - Miss the recovery. When the metric clears, the throttler
keeps blocking until the next sample captures the new value —
wasting ~
Tseconds of available capacity. - Worked example (Noach,
Anatomy
of a Throttler, part 1): heartbeat injection at
12:00:00.000, sample at12:00:00.995captures that heartbeat, client checks at12:00:01.990gets a response based on a sample that is now ~2 s old — throttler reading a 2-s-old version of a 1-s-old metric.
The jitter problem¶
Even with a 1-s interval, the phase of the sample against the
event stream matters. A heartbeat injected at t=0.000 and sampled
at t=0.995 is ~1 s behind reality at the moment of sampling, and
~2 s behind by the time a client reads the cached value.
Granularity is bounded by the slowest step¶
For concepts/replication-lag measurement, the effective granularity is the larger of:
- The heartbeat-injection interval (how often the primary writes a detectable event),
- The sampling interval (how often the replica monitors the heartbeat table),
- The staleness of the last read (how recently the throttler refreshed its cached value).
All three must be tightened to tighten the overall observability latency.
The oversampling rule of thumb¶
Networking-hardware design carries a rule of thumb: sample at 2–5× the rate of the signal you care about to avoid aliasing. For throttler-design, this becomes:
"If the acceptable replication lag is at 5 seconds, then it's best to have a heartbeat/sampling interval of 1–2 seconds." — Noach
See concepts/oversampling-metric-interval for the full articulation.
The release-thundering-herd problem¶
Long sampling intervals plus multiple throttled jobs plus a shared metric cause synchronised release:
T: 0s metric > threshold, all jobs blocked
T: 5s metric drops, sample still shows old value
T: 10s next sample fires; all jobs see "clear" simultaneously
T: 10s+ε all jobs push concurrent subtasks; metric spikes
T: 10s+2ε all jobs blocked again
Shorter intervals smooth this out by giving different jobs slightly different pictures of the metric at their individual check moments, desynchronising the release burst.
Cost of shorter intervals¶
- Observability storage scales linearly with sampling rate.
- Monitor-side CPU (computing the metric) scales linearly.
- Source-side cost (e.g. heartbeat writes on the primary) scales linearly — every heartbeat is one extra write that ships through every replica's changelog. At high fan-out, this becomes a non-trivial overhead on the replication substrate.
The trade-off is explicit: "lower intervals and more accurate metrics reduce spikes and spread the workload more efficiently. That, too, comes at a cost, which we will discuss in a later post."
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical wiki framing. Noach walks the stale-sample problem with a worked timing example (heartbeat at 1/s, sample at 1/s, 2-s staleness outcome) and introduces the oversampling rule of thumb. Links to concepts/oversampling-metric-interval for the specific recommendation.
Related¶
- concepts/oversampling-metric-interval — the specific sample-above-threshold rule of thumb.
- patterns/heartbeat-based-replication-lag-measurement — the mechanism where sampling-interval trade-offs are most directly visible.
- concepts/database-throttler — the use case where sampling interval becomes a control-loop parameter.
- concepts/replication-lag — canonical metric where the sampling-interval trade-off plays out.