Skip to content

PATTERN Cited by 1 source

Idle-state throttler hibernation

Pattern

When a throttler and its associated metric-generation mechanisms (notably replication heartbeats) go unused for a period, slow or stop their activity to cut self-cost. On the next client request, re-ignite: resume normal probing rates + heartbeat injection + peer communication. The first few checks during the re-ignition window will read stale data and likely reject the caller, who must retry.

Why it's needed

A throttler is a load source. Running at full probing + gossip + heartbeat-write rates continuously imposes three costs:

  1. Metric-collection load on every DB / OS source.
  2. Inter-throttler communication load in distributed deployments.
  3. Binlog volume from heartbeat injection. The dominant cost at scale: "It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set." Every heartbeat row persists to the binlog + replays on every replica
  4. is backed up.

Background-job workloads are the canonical trigger — hours of work interleaved with hours of idle. Running the throttler at full tilt during idle hours is pure waste.

"The throttler can choose to slow down based on lack of requests. It could either stop collecting metrics altogether and go into hibernation, or it might just slow down its normal pace. It would take a client checking the throttler to re-ignite the high-frequency collection of metrics."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2

Mechanism

Idle detection: no client check for T_idle seconds.

Slowdown transitions:

  • Normal → slow: e.g. probe rate 1 Hz → 0.1 Hz; heartbeat 1 Hz → 0.01 Hz.
  • Slow → hibernate: stop probing entirely; stop heartbeat injection entirely.

Re-ignition trigger: first client check after hibernation.

Re-ignition actions:

  1. Resume metric probing at normal cadence.
  2. Resume heartbeat injection on the primary.
  3. Wake peer throttlers in a distributed deployment — see below.
  4. Wait for replication to carry the fresh heartbeats and for metric probes to complete a full cycle.

Client-visible cold-start window:

"The first check, and likely also the next few checks, will run on stale data and potentially reject requests that would otherwise be accepted."

Duration: "a few seconds to get to a fully active operation." Client retry is the load-bearing compensation.

Coordinated re-ignition across distributed throttlers

"With a distributed throttler design, throttlers which depend on each other should be able to inform each other upon being checked. All throttlers who communicate with each other should re-ignite upon the first request to any of them."

The first-touch client reignites not just the throttler it asks but its peers too — otherwise the aggregator sees a stale roll-up until each peer is independently re-ignited.

Costs and failure modes

  • Caller must retry. A single-shot client with no retry sees false rejections after every idle period.
  • Stale-data rejections are conservative, not wrong. The throttler fails safe: rejects on stale data rather than accepting on stale data. The only "cost" is a short window of extra retries.
  • Hibernation-aware clients can do better — e.g. dispatch a lightweight warm-up call before a real workload to amortize the re-ignition.
  • Time-based hibernation is brittle to spiky workloads — define T_idle with workload shape in mind.

What to hibernate

Target Savings impact Cold-start impact
Metric probing rate Medium (probe CPU + connections) Low — resumes immediately
Peer-throttler gossip Medium (cross-service traffic) Low
Heartbeat injection High — binlog volume is the biggest cost High — replicas must receive fresh heartbeat before lag metric is meaningful

The asymmetry is important: heartbeat generation is the axis where hibernation pays off most, and also the axis where the cold-start penalty is highest.

Composition

Seen in

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2 — canonical wiki introduction. Shlomi Noach frames hibernation as a first-class design element of the Vitess tablet throttler, specifically motivated by the binlog cost of pt-heartbeat-style replication heartbeats and by the "hours of work / hours of idle" shape of massive background jobs.
Last updated · 319 distilled / 1,201 read