CONCEPT Cited by 1 source

Metric staleness from polling layers¶

Definition¶

When a metric travels through multiple independent polling stages (agent collects from source; consumer polls agent), the worst-case age of the metric seen by the consumer is the sum of the polling intervals, not just the last one. Each added polling layer inflates the staleness ceiling.

The canonical Noach example¶

"The agent collects metrics at its own interval. For example, the agent might collect data once per second, while the throttler polls the API at its own interval, which could also be once per second. The throttler may now collect data that is up to 2 seconds stale, as opposed to up to 1 second stale in the monolithic, synchronous approach."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2

Two 1 Hz pollers stacked → metric is up to 2 seconds stale worst-case, not 1.

Generalisation¶

For N polling stages with intervals T_1, T_2, … T_N:

Best case (pollers perfectly phase-aligned): max(T_i) staleness.
Worst case (pollers maximally out of phase): sum(T_i) staleness.
Average case: somewhere between max and sum depending on drift.

In practice, pollers run on independent clocks with no phase coupling, so designs should budget for the sum bound.

Why designs accept the penalty anyway¶

Host-agent metrics APIs take the staleness hit on purpose because:

Direct access to every metric source from a central service doesn't scale (connection count, security boundaries).
Host-local agents can be simpler, cheaper, and independently upgraded.
2 seconds is often acceptable for the throttling decision — the throttler's reject/accept decisions are not time-critical at sub-second granularity in most database-workload cases.

When the penalty becomes painful¶

Fast-moving metrics — a replica's lag can go from 0 to 10 s in 2 seconds during an overload event; a 2-s staleness ceiling can hide that until the damage is done.
Feedback loops — if a throttler's own decision affects the metric (e.g. throttling writes affects lag), stale metrics produce oscillation.
Tight SLO windows — 2 s staleness on a 5 s lag threshold leaves only 3 s of effective headroom.

Mitigations¶

Push instead of poll — agents stream metric updates directly, removing the consumer-side poll interval.
Event-driven publish — agents emit metrics on threshold crossings instead of periodically.
Tighter agent-poll + consumer-poll intervals at the cost of load (the axis hibernation trades against).
Avoid agent-mediation for the hottest metrics — use direct access for the single most important signal and agent-mediation for the rest.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2 — canonical wiki introduction. Shlomi Noach uses the 2×1 Hz polling example to show how moving from a monolithic throttler (direct metric access, ~1 s stale) to an agent-mediated one doubles the worst-case staleness.