Skip to content

PATTERN Cited by 1 source

Collaborative throttler check API

Problem

When a database must be protected from massive batch workloads (migrations, online DDL, mass purges), the protection mechanism needs to be:

  • Cheap per check — jobs check thousands or millions of times over their lifetime.
  • Easy to adopt — unmodified application paths should not be forced through a proxy.
  • Correct by construction — rejection must prevent harm, not merely log it.

Solution

Expose the throttler as a standalone service / endpoint that clients query before each unit of work. The client contract:

while not throttler.ok():
    sleep(backoff_period)
do_subtask()

The throttler returns a boolean (or structured rejection reason). The client is expected to respect the response — rejection means stop; acceptance means proceed with one subtask's worth of work. The client is not forced through a proxy; the database path is unchanged.

"Some throttler implementations are collaborative, meaning they assume clients will respect their instructions. Others act as barriers between the app and the database. Either way, if the throttler indicates that the database is overloaded, the job should hold back for a period of time and then request access again."

— Shlomi Noach, Anatomy of a Throttler, part 1

Why collaborative

Compared to a barrier throttler that intercepts every database request:

Property Collaborative Barrier
Overhead per query Zero on non-throttled path Always in path
Adoption Client opt-in per job Required for every request
Enforcement Trust-based Enforced
Operational blast radius Limited to throttler downtime affecting new work Full dependency on pool availability
Custom metrics Easy to add Harder to wire through proxy

Collaborative is the default choice for internal batch toolinggh-ost, pt-online-schema-change, Vitess's migration subsystems — where client compliance is a reasonable assumption.

Interface shape

Minimum viable interface:

GET /throttler/check            → 200 OK / 429 Too Many Requests
GET /throttler/check?app=...    → 200 OK / 429 Too Many Requests (scoped)

Richer shapes:

GET /throttler/check?app=vreplication&shard=shard-0
     → 429 { reason: "replication_lag", value: 7.2, threshold: 5.0 }

Scoping (app, shard, workload-class) lets the operator enable different thresholds or different metric sets for different categories of work.

Subtask-size tuning

Collaborative throttlers work on the check-then-work cycle: each subtask is atomic from the throttler's POV. Subtask size is tuned between:

  • Small enough that one subtask cannot single-handedly tank the database between checks.
  • Large enough that per-check overhead is amortised.

A typical choice is 100 rows per subtask for row-based migration workloads. A 10M-row migration is then 100,000 subtasks and at least 100,000 throttler checks.

Backoff policy

Clients typically implement:

  • Bounded exponential backoff on repeated rejections (1 s → 2 s → 4 s → capped at a ceiling).
  • Jitter to desynchronise concurrent clients after a release moment (see concepts/oversampling-metric-interval for the release-thundering-herd problem).
  • Check frequency somewhere around the metric's sampling interval — checking faster than the metric updates is wasted.

Non-compliance

A client that ignores the throttler can inflict the exact load the throttler was installed to prevent. Mitigations:

  • Audit logs of client check frequency and compliance.
  • Detection of known-misbehaving client IPs / service identifiers via unusual database load absent throttler rejection correlation.
  • Escalation to barrier throttling for demonstrated non- compliance.

In practice, collaborative throttlers are deployed where only known, controlled clients (internal tooling, managed migration systems) interact with the throttler. Untrusted / external clients get a barrier.

Seen in

  • sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical framing. Noach contrasts collaborative with barrier and notes that the discussion focuses on collaborative (which is the default for the workloads he cares about).
  • systems/vitess-throttler — canonical collaborative throttler in the MySQL/Vitess ecosystem. VReplication, VDiff, MoveTables, and online DDL all use the collaborative check API before each subtask.
Last updated · 319 distilled / 1,201 read