PATTERN Cited by 1 source
Idle-state throttler hibernation¶
Pattern¶
When a throttler and its associated metric-generation mechanisms (notably replication heartbeats) go unused for a period, slow or stop their activity to cut self-cost. On the next client request, re-ignite: resume normal probing rates + heartbeat injection + peer communication. The first few checks during the re-ignition window will read stale data and likely reject the caller, who must retry.
Why it's needed¶
A throttler is a load source. Running at full probing + gossip + heartbeat-write rates continuously imposes three costs:
- Metric-collection load on every DB / OS source.
- Inter-throttler communication load in distributed deployments.
- Binlog volume from heartbeat injection. The dominant cost at scale: "It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set." Every heartbeat row persists to the binlog + replays on every replica
- is backed up.
Background-job workloads are the canonical trigger — hours of work interleaved with hours of idle. Running the throttler at full tilt during idle hours is pure waste.
"The throttler can choose to slow down based on lack of requests. It could either stop collecting metrics altogether and go into hibernation, or it might just slow down its normal pace. It would take a client checking the throttler to re-ignite the high-frequency collection of metrics."
— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2
Mechanism¶
Idle detection: no client check for T_idle seconds.
Slowdown transitions:
- Normal → slow: e.g. probe rate 1 Hz → 0.1 Hz; heartbeat 1 Hz → 0.01 Hz.
- Slow → hibernate: stop probing entirely; stop heartbeat injection entirely.
Re-ignition trigger: first client check after hibernation.
Re-ignition actions:
- Resume metric probing at normal cadence.
- Resume heartbeat injection on the primary.
- Wake peer throttlers in a distributed deployment — see below.
- Wait for replication to carry the fresh heartbeats and for metric probes to complete a full cycle.
Client-visible cold-start window:
"The first check, and likely also the next few checks, will run on stale data and potentially reject requests that would otherwise be accepted."
Duration: "a few seconds to get to a fully active operation." Client retry is the load-bearing compensation.
Coordinated re-ignition across distributed throttlers¶
"With a distributed throttler design, throttlers which depend on each other should be able to inform each other upon being checked. All throttlers who communicate with each other should re-ignite upon the first request to any of them."
The first-touch client reignites not just the throttler it asks but its peers too — otherwise the aggregator sees a stale roll-up until each peer is independently re-ignited.
Costs and failure modes¶
- Caller must retry. A single-shot client with no retry sees false rejections after every idle period.
- Stale-data rejections are conservative, not wrong. The throttler fails safe: rejects on stale data rather than accepting on stale data. The only "cost" is a short window of extra retries.
- Hibernation-aware clients can do better — e.g. dispatch a lightweight warm-up call before a real workload to amortize the re-ignition.
- Time-based hibernation is brittle to spiky
workloads — define
T_idlewith workload shape in mind.
What to hibernate¶
| Target | Savings impact | Cold-start impact |
|---|---|---|
| Metric probing rate | Medium (probe CPU + connections) | Low — resumes immediately |
| Peer-throttler gossip | Medium (cross-service traffic) | Low |
| Heartbeat injection | High — binlog volume is the biggest cost | High — replicas must receive fresh heartbeat before lag metric is meaningful |
The asymmetry is important: heartbeat generation is the axis where hibernation pays off most, and also the axis where the cold-start penalty is highest.
Composition¶
- + patterns/singular-vs-distributed-throttler — orthogonal; hibernate any topology. Distributed topologies add coordinated-re-ignition requirements.
- + patterns/throttler-per-shard-hierarchy — the shard-primary throttler may need to re-ignite all its replica throttlers on first shard-scope request.
- + concepts/throttler-fail-open-vs-fail-closed — the rejection-on-stale-data window interacts with client fail-open/fail-closed policy.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2
— canonical wiki introduction. Shlomi Noach frames
hibernation as a first-class design element of the
Vitess tablet throttler, specifically motivated by the
binlog cost of
pt-heartbeat-style replication heartbeats and by the "hours of work / hours of idle" shape of massive background jobs.