Skip to content

PATTERN Cited by 1 source

Throttler-per-shard hierarchy

Pattern

Deploy one throttler per host / tablet, and let the shard-primary's throttler act as the aggregator for its shard's metrics. Clients then target one of two scopes:

  • Host-scope: consult the throttler on the specific host the workload touches.
  • Shard-scope: consult the throttler on the shard primary, which aggregates metrics from every replica in its replication topology.

This is the Vitess tablet throttler architecture, and the canonical wiki instance of the pattern.

Why the shape exists

Two observations from the Noach post drive the design:

  1. Different workloads care about different metric scopes. Massive writes on the primary care about max-lag across all replicas (shard-scope); massive reads on one replica care about that replica's CPU / page cache / disk I/O (host-scope). One throttler per scope is wrong: too many throttlers for clients to know about, and the primary has to know about replicas anyway for shard-scope rollups.
  2. Cross-shard communication is an unnecessary fan-out tax. "Different shards represent distinct architectural elements, and there is no cross-shard throttler communication. This limits the hosts/services monitored by any single throttler to a sustainable amount."

"The Vitess tablet throttler combines multiple design approaches to achieve different throttling scopes. The throttler runs on each and every vttablet, mapping one throttler for each MySQL database server. Each such throttler first and foremost collects metrics from its own vttablet host and from its associated MySQL server. Then, the throttlers (or vttablet servers) of any shard, or a replication topology (primary and replicas) collaborate to represent the 'shard' throttler. The throttler service on the primary server takes the responsibility for collecting the metrics from all of the shard's throttlers and aggregating them as the 'shard' metrics."

— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2

Topology

   Shard A                        Shard B
 ┌─────────────────────────┐    ┌─────────────────────────┐
 │  Primary-throttler      │    │  Primary-throttler      │
 │  (rollup of shard A's   │    │  (rollup of shard B's   │
 │   replica metrics)      │    │   replica metrics)      │
 │  ▲  ▲  ▲                │    │  ▲  ▲  ▲                │
 │  │  │  │  pull lag      │    │  │  │  │  pull lag      │
 │  │  │  │  from replicas │    │  │  │  │  from replicas │
 │  │  │  │                │    │  │  │  │                │
 │  R1 R2 R3  (replicas)   │    │  R1 R2 R3  (replicas)   │
 │  per-host throttlers    │    │  per-host throttlers    │
 └─────────────────────────┘    └─────────────────────────┘
          ^                                ^
          │  host-scope clients            │  host-scope clients
          │  hit R_i throttler directly    │  hit R_i throttler directly
          │                                │
   shard-scope clients              shard-scope clients
   hit primary throttler            hit primary throttler

   NO cross-shard throttler traffic (by design)

Which scope to consult

Workload Scope Throttler queried
Massive write to primary Shard Shard-primary throttler (returns max replica lag)
Massive read on one replica Host That replica's throttler
Cross-shard aggregate reads None Consult N shard-primary throttlers independently

Why not just one central throttler per shard

That would give the same shard-scope answer but lose host-scope granularity: you'd need a separate host-scope throttler per host, duplicating the probe path. The per-host throttler is the same process that gets rolled up by the shard primary — one deployment, two access patterns.

Composition with other throttler patterns

Trade-offs

  • Bounded fan-out per throttler — a primary throttler only aggregates its own shard's replicas; never talks cross-shard. Natural sharding-topology-matching of the monitoring plane.
  • Shard-scope throttler = shard-primary SPOF. If the primary is unreachable, shard-scope throttling loses the rollup. Client fallback: fail-open with bounded wait.
  • Host-scope is naturally resilient — if a single replica's throttler is down, only workloads targeting that replica are affected.
  • Rollup math is metric-specific — for lag, it's max; for CPU, it's max; for replication-topology health, possibly more complex predicates.

Seen in

Last updated · 319 distilled / 1,201 read