Skip to content

CONCEPT Cited by 1 source

Throttler hibernation

Definition

Throttler hibernation is the design posture of slowing or pausing the throttler's own activity — metric collection, inter-throttler communication, and underlying metric generation (e.g. replication heartbeats) — during idle periods when no clients are checking. Re-ignition happens on the first client request, incurring a cold-start window during which checks run on stale data.

"The throttler can choose to slow down based on lack of requests. It could either stop collecting metrics altogether and go into hibernation, or it might just slow down its normal pace. It would take a client checking the throttler to re-ignite the high-frequency collection of metrics."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2

Why hibernate

A throttler is itself a load source. Three cost axes:

  1. Metric-collection network + compute — persistent connections to every DB server, repeated probing at sub-second intervals.
  2. Inter-throttler communication — in distributed throttler designs, each throttler polls its siblings.
  3. Metric generation — the most expensive case: heartbeat injection for replication-lag measurement writes rows that persist to binary logs and re-replay on every replica. "It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set."

Massive background jobs are the canonical trigger: "A job can take a few hours to run, but there may yet be a few hours break between two jobs. During that break, there isn't strictly a need for the throttler to collect data at high rate."

What hibernation covers

  • Metric collection rate — drop from 1 Hz to a sparse keep-alive or stop entirely.
  • Inter-throttler gossip — same.
  • Metric generation — especially heartbeat injection, which persists cost into the binary log and is the largest self-cost axis.

The cold-start penalty

Hibernation has a caller-visible cost:

"The first check, and likely also the next few checks, will run on stale data and potentially reject requests that would otherwise be accepted."

Re-ignition takes "a few seconds to get to a fully active operation, when the throttler has re-engaged, heartbeats re-generated, and replication is caught up with at least the very first re-generated heartbeats, and the clients must be prepared for some retries."

Client retry is the compensation. A non-retrying client sees spurious rejections after every idle period.

Coordinated re-ignition in distributed throttlers

"With a distributed throttler design, throttlers which depend on each other should be able to inform each other upon being checked. All throttlers who communicate with each other should re-ignite upon the first request to any of them."

The first-touch client reignites not just the throttler it queried but all its peer throttlers, to avoid a cascading series of cold starts.

Design checklist

  • Define the idle-detection criterion (e.g. no client check for N seconds).
  • Choose: slow or stop. Slowing keeps some data fresh; stopping saves more but lengthens the re-ignition window.
  • Couple metric generation to the same policy — heartbeats are the big binlog-cost item; hibernating the throttler without hibernating the heartbeat generator leaves most of the cost on the table.
  • Document the cold-start window to clients. Retries must be assumed.
  • Coordinate re-ignition across peer throttlers.

Seen in

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2 — canonical wiki introduction. Shlomi Noach lays out the hibernation rationale (binlog cost of heartbeats), the re-ignition semantics (first client request wakes the system; first few checks read stale data), and the coordinated-re-ignition requirement for distributed throttlers.
Last updated · 319 distilled / 1,201 read