PATTERN Cited by 1 source

Throttler-per-shard hierarchy¶

Pattern¶

Deploy one throttler per host / tablet, and let the shard-primary's throttler act as the aggregator for its shard's metrics. Clients then target one of two scopes:

Host-scope: consult the throttler on the specific host the workload touches.
Shard-scope: consult the throttler on the shard primary, which aggregates metrics from every replica in its replication topology.

This is the Vitess tablet throttler architecture, and the canonical wiki instance of the pattern.

Why the shape exists¶

Two observations from the Noach post drive the design:

Different workloads care about different metric scopes. Massive writes on the primary care about max-lag across all replicas (shard-scope); massive reads on one replica care about that replica's CPU / page cache / disk I/O (host-scope). One throttler per scope is wrong: too many throttlers for clients to know about, and the primary has to know about replicas anyway for shard-scope rollups.
Cross-shard communication is an unnecessary fan-out tax. "Different shards represent distinct architectural elements, and there is no cross-shard throttler communication. This limits the hosts/services monitored by any single throttler to a sustainable amount."

"The Vitess tablet throttler combines multiple design approaches to achieve different throttling scopes. The throttler runs on each and every vttablet, mapping one throttler for each MySQL database server. Each such throttler first and foremost collects metrics from its own vttablet host and from its associated MySQL server. Then, the throttlers (or vttablet servers) of any shard, or a replication topology (primary and replicas) collaborate to represent the 'shard' throttler. The throttler service on the primary server takes the responsibility for collecting the metrics from all of the shard's throttlers and aggregating them as the 'shard' metrics."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2

Topology¶

   Shard A                        Shard B
 ┌─────────────────────────┐    ┌─────────────────────────┐
 │  Primary-throttler      │    │  Primary-throttler      │
 │  (rollup of shard A's   │    │  (rollup of shard B's   │
 │   replica metrics)      │    │   replica metrics)      │
 │  ▲  ▲  ▲                │    │  ▲  ▲  ▲                │
 │  │  │  │  pull lag      │    │  │  │  │  pull lag      │
 │  │  │  │  from replicas │    │  │  │  │  from replicas │
 │  │  │  │                │    │  │  │  │                │
 │  R1 R2 R3  (replicas)   │    │  R1 R2 R3  (replicas)   │
 │  per-host throttlers    │    │  per-host throttlers    │
 └─────────────────────────┘    └─────────────────────────┘
          ^                                ^
          │  host-scope clients            │  host-scope clients
          │  hit R_i throttler directly    │  hit R_i throttler directly
          │                                │
   shard-scope clients              shard-scope clients
   hit primary throttler            hit primary throttler

   NO cross-shard throttler traffic (by design)

Which scope to consult¶

Workload	Scope	Throttler queried
Massive write to primary	Shard	Shard-primary throttler (returns max replica lag)
Massive read on one replica	Host	That replica's throttler
Cross-shard aggregate reads	None	Consult N shard-primary throttlers independently

Why not just one central throttler per shard¶

That would give the same shard-scope answer but lose host-scope granularity: you'd need a separate host-scope throttler per host, duplicating the probe path. The per-host throttler is the same process that gets rolled up by the shard primary — one deployment, two access patterns.

Composition with other throttler patterns¶

+ patterns/singular-vs-distributed-throttler — this is the distributed end of that spectrum.
+ patterns/idle-state-throttler-hibernation — per-host throttlers hibernate independently; the primary must coordinate re-ignition across all replica throttlers on first shard-scope request.
+ concepts/replication-heartbeat — the primary's throttler generates heartbeats that replicas read to measure their own lag, which they publish back for the primary's shard-scope rollup.

Trade-offs¶

Bounded fan-out per throttler — a primary throttler only aggregates its own shard's replicas; never talks cross-shard. Natural sharding-topology-matching of the monitoring plane.
Shard-scope throttler = shard-primary SPOF. If the primary is unreachable, shard-scope throttling loses the rollup. Client fallback: fail-open with bounded wait.
Host-scope is naturally resilient — if a single replica's throttler is down, only workloads targeting that replica are affected.
Rollup math is metric-specific — for lag, it's max; for CPU, it's max; for replication-topology health, possibly more complex predicates.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2 — canonical wiki introduction. Shlomi Noach describes Vitess's tablet throttler as the working example of the pattern; the deliberate no-cross-shard-communication rule; and the clean mapping from metric scope to throttler choice.