PlanetScale — Anatomy of a Throttler, part 2¶
Summary¶
Shlomi Noach (Vitess maintainer, now at PlanetScale) continues
his throttler-architecture series. Part 1 established the
problem shape; part 2 opens up the deployment-topology design
space — singular vs distributed, direct-access vs
host-agent-mediated, and the Vitess tablet
throttler's per-host-plus-shard-primary hierarchy as the
canonical working example. It then tackles the two
self-cost questions every throttler has to answer: what happens
when the throttler is unavailable (fail-open vs fail-closed,
active-passive, AZ-sibling designs), and how do you keep the
throttler itself from generating load (busy-loop avoidance,
hibernation of metric
collection + heartbeat generation during idle periods). The post
is the canonical wiki source for replication-lag heartbeats
as implemented by tools like
pt-heartbeat
and for the throttler-hibernation trade-off (first few
requests after idle are rejected on stale data; caller retries
drive the system back to fully engaged).
Key takeaways¶
-
Singular vs distributed is a topology choice, not a correctness one. A singular throttler is a single monolithic service that probes all database servers + OS metrics directly. It is "a simple, monolithic, synchronous approach" (Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2). A distributed throttler runs multiple collaborating throttlers (per-AZ, per-functional-partition, per-host). The article walks both sides of the space and concludes distribution is needed at scale but introduces layered polling-interval staleness.
-
The monolithic throttler has two structural problems. (1) Scalability: "There's only so many connections it can maintain while running high-frequency probing." (2) Direct-metric-access restrictions: many environments don't let one service hold persistent connections to every database + OS metric source; the workaround is to deploy a per-host metric-scraper agent that exposes metrics over HTTP, but "in adding this agent component, even if it's the simplest script to scrape and publish metrics, we've effectively turned our singular throttler into a distributed multi-component system" — and now the throttler sees metrics up to 2 seconds stale instead of 1 second stale because agent-poll-interval stacks on top of throttler-poll-interval.
-
High availability forces the fail-open / fail-closed decision. "If no throttler is available, should clients consider themselves rejected, or should they proceed with full power?" The common approach is a bounded wait then proceed unthrottled — "hold off up to some timeout and, from there on, proceed unthrottled, taking into consideration the possibility that the throttler may not be up for a while." This is the canonical wiki framing for concepts/throttler-fail-open-vs-fail-closed. Other HA options named: active-passive (traffic always to active, passive ready to step up, optionally collecting metrics while passive), AZ-parallel singular throttlers (independent, may disagree at any point in time but overall equivalent).
-
The Vitess tablet throttler is the canonical working example. "The Vitess tablet throttler combines multiple design approaches to achieve different throttling scopes. The throttler runs on each and every
vttablet, mapping one throttler for each MySQL database server." Eachvttabletthrottler collects its own host + MySQL metrics; the shard primary's throttler aggregates metrics from all replica throttlers in its replication topology to represent the "shard throttler." Canonical per-host + shard-roll-up instance. -
Metric scope matters: shard vs host. Different workloads consult different metric scopes against the same throttler hierarchy.
- Massive writes to the shard primary: throttled by replication lag — a shard-scope metric (highest lag among all replicas). Clients consult the primary's throttler.
- Massive reads on a specific replica: can pollute that replica's page cache + overload its disk I/O without affecting other replicas' replication performance. Clients check only the throttler on the specific replica — a host-scope metric. The canonical wiki framing: "This introduces the concept of a metric's scope, which can be an entire shard or a specific host in this scenario."
-
Cross-shard throttler communication is deliberately avoided. "Different shards represent distinct architectural elements, and there is no cross-shard throttler communication. This limits the hosts/services monitored by any single throttler to a sustainable amount." Connection/metric fan-out stays bounded by design.
-
Busy loops are forbidden; clients sleep or ride a free-pass window. "A rejected client should sleep for a pre-determined amount of time before rechecking the throttler. Conversely, depending on the metric, a client might get a free pass for a period of time after a successful check." Worked example: replication lag is 0.5 s against a 5 s threshold → the next 4.5 s are guaranteed to be successful and can be skipped (minus metric-collection granularity). Self-cost reduction baked into the protocol.
-
Hibernation — let the throttler (and its metric generators) sleep when idle. When no clients have been checking for a while, the throttler can "slow down its normal pace" or go fully dormant; "it would take a client checking the throttler to re-ignite the high-frequency collection of metrics." Same principle applies to metric generation — notably replication-lag heartbeats. First few checks after a hibernation period will read stale data and may be wrongly rejected; the system relies on expected client retry mechanisms to drive re-ignition.
-
Replication-lag heartbeats are the dominant lag metric but they cost binlog disk. "The most reliable way to evaluate replication lag is by injecting timestamps on a dedicated table on the Primary server, then reading the replicated value on a replica, comparing it with the system time on said replica."
pt-heartbeatis the canonical tool. Injection interval dictates lag-metric granularity, but heartbeat events are persisted in binary logs + re-written on replicas → "the introduction of heartbeats causes a significant increase in binlog generation. With more binlog events having to be persisted, more binary log files are generated per given period of time. These consume more disk space. It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set." Hibernating heartbeat generation during idle periods directly reduces this cost. -
Hibernation has caller cost — retries are mandatory. "The first check, and likely also the next few checks, will run on stale data and potentially reject requests that would otherwise be accepted." Re-ignition takes "a few seconds" for heartbeat generation + replication catch-up + full metric re-engagement. The expected retry mechanism in the client is the load-bearing compensation.
-
Distributed-throttler dependencies require coordinated re-ignition. "With a distributed throttler design, throttlers which depend on each other should be able to inform each other upon being checked. All throttlers who communicate with each other should re-ignite upon the first request to any of them."
Systems / concepts / patterns extracted¶
-
Systems
- systems/vitess — the tablet throttler (per-
vttablet - shard-primary aggregator) is the canonical working example throughout the post; extended with throttler-deployment-topology content orthogonal to prior VReplication / evalengine / Consistent Lookup Vindex coverage.
- systems/mysql — the replication topology whose lag is measured by heartbeat injection; binary log is the cost center of heartbeat-based lag measurement.
- systems/planetscale — Shlomi Noach's employer and the deployment context; PlanetScale consumes Vitess throttling for its managed MySQL tier.
- systems/vitess — the tablet throttler (per-
-
Concepts
- concepts/throttler-fail-open-vs-fail-closed — new wiki vocabulary for the client-side design decision when the throttler is unreachable: bounded wait then proceed-unthrottled vs immediate reject.
- concepts/throttler-metric-scope — new wiki vocabulary for "the concept of a metric's scope, which can be an entire shard or a specific host"; different workloads consult different scopes.
- concepts/throttler-hibernation — new wiki vocabulary for slowing or pausing metric collection (and heartbeat generation) during idle periods to cut self-cost, accepting the cold-start penalty on the first few checks after re-ignition.
- concepts/replication-heartbeat — new wiki
vocabulary for the canonical MySQL technique of
injecting timestamp rows on the primary and
measuring their arrival time on replicas; covers
pt-heartbeat, binlog cost, interval-granularity tradeoff, failover-aware write-only-on-primary requirement. - concepts/metric-staleness-from-polling-layers — new wiki vocabulary for the layered-polling-interval effect: each additional polling layer (agent collects every 1 s; throttler polls every 1 s → up-to-2-second staleness) increases the worst-case metric-age bound.
-
Patterns
- patterns/singular-vs-distributed-throttler — new canonical wiki pattern covering the topology design space (singular monolithic / AZ-parallel / active-passive / per-AZ / per-functional-partition / per-host).
- patterns/host-agent-metrics-api — new canonical wiki pattern: per-host metrics daemon exposes locally-collected metrics over HTTP; the central consumer (throttler, monitor, scheduler) polls a single API per host instead of holding a persistent connection to every metric source. Accepted cost: layered polling staleness + upgrade/backwards-compat surface.
- patterns/throttler-per-shard-hierarchy — new canonical wiki pattern: one throttler per host/tablet
- shard-primary throttler aggregates all shard replicas' metrics. Vitess tablet throttler is the canonical instance.
- patterns/idle-state-throttler-hibernation — new canonical wiki pattern: slow or stop metric collection + metric generation (heartbeat injection) during idle periods; re-ignite on first request; clients retry through the stale window.
Operational numbers¶
- Monolithic-throttler metric staleness ceiling = "up to 1 second stale" (with 1 Hz direct probing).
- Agent-mediated throttler metric staleness ceiling = "up to 2 seconds stale" (agent collects 1 Hz + throttler polls the agent 1 Hz independently).
- Free-pass window worked example = replication lag 0.5 s, threshold 5 s → next 4.5 s are guaranteed successful (minus metric-collection granularity).
- Binlog-size observation = "It is not uncommon to see MySQL deployments where the total size of binary logs is larger than the actual data set." — economic motivation for hibernating heartbeat generation.
- Re-ignition latency = "a few seconds to get to a fully active operation" after hibernation, before throttler, heartbeat generator, and replication catch up.
- Vitess shard topology cap — no explicit number; the design rule is "this limits the hosts/services monitored by any single throttler to a sustainable amount" via per-shard partitioning + no cross-shard throttler communication.
Caveats¶
- No latency / throughput numbers for the throttler itself. The post is topology + design-space focused; it doesn't quantify the check-path latency, the metric-collection CPU cost, or the heartbeat binlog byte-cost in production.
- No comparative data between topologies. No benchmark of singular-with-agent vs per-host-distributed at the same fleet size.
- No PlanetScale-specific production numbers. Noach is writing from his Vitess-maintainer perspective; the post is vendor-educational, not a production post-mortem.
- Fail-open vs fail-closed trade-off is named but not quantified. The post frames the decision but doesn't walk the operational consequences (e.g. cascading-load risk from fail-open during a wider outage).
- Active-passive metric-collection-while-passive described but not evaluated. The post doesn't address split-brain or the cost of running full probing on standby.
- Cross-shard throttler communication deliberately excluded. No treatment of rare legitimate cross-shard throttling needs (global rate limits, cross-shard batch writes).
- Part-of-a-series. Part 1 is referenced but not ingested on the wiki at this ingest time; part 3 (clients, prioritization, starvation) is forward- referenced but not yet published at fetch time.
- Tier-3 source, scope-disposition on-scope per the PlanetScale skip-rules: Vitess-internals content by a Vitess core maintainer is default-include, and the post's architectural density is ~100% of the body (every paragraph advances a topology, cost analysis, or metric-mechanism primitive). Not a product announcement, not marketing copy; no "Introducing" / "Now available" / pricing content.
Source¶
- Original: https://planetscale.com/blog/anatomy-of-a-throttler-part-2
- Raw markdown:
raw/planetscale/2026-04-21-anatomy-of-a-throttler-part-2-9cf3c465.md
Related¶
- systems/vitess
- systems/mysql
- systems/planetscale
- concepts/throttler-fail-open-vs-fail-closed
- concepts/throttler-metric-scope
- concepts/throttler-hibernation
- concepts/replication-heartbeat
- concepts/metric-staleness-from-polling-layers
- patterns/singular-vs-distributed-throttler
- patterns/host-agent-metrics-api
- patterns/throttler-per-shard-hierarchy
- patterns/idle-state-throttler-hibernation
- companies/planetscale