PATTERN Cited by 1 source

Collaborative throttler check API¶

Problem¶

When a database must be protected from massive batch workloads (migrations, online DDL, mass purges), the protection mechanism needs to be:

Cheap per check — jobs check thousands or millions of times over their lifetime.
Easy to adopt — unmodified application paths should not be forced through a proxy.
Correct by construction — rejection must prevent harm, not merely log it.

Solution¶

Expose the throttler as a standalone service / endpoint that clients query before each unit of work. The client contract:

while not throttler.ok():
    sleep(backoff_period)
do_subtask()

The throttler returns a boolean (or structured rejection reason). The client is expected to respect the response — rejection means stop; acceptance means proceed with one subtask's worth of work. The client is not forced through a proxy; the database path is unchanged.

"Some throttler implementations are collaborative, meaning they assume clients will respect their instructions. Others act as barriers between the app and the database. Either way, if the throttler indicates that the database is overloaded, the job should hold back for a period of time and then request access again."

— Shlomi Noach, Anatomy of a Throttler, part 1

Why collaborative¶

Compared to a barrier throttler that intercepts every database request:

Property	Collaborative	Barrier
Overhead per query	Zero on non-throttled path	Always in path
Adoption	Client opt-in per job	Required for every request
Enforcement	Trust-based	Enforced
Operational blast radius	Limited to throttler downtime affecting new work	Full dependency on pool availability
Custom metrics	Easy to add	Harder to wire through proxy

Collaborative is the default choice for internal batch tooling — gh-ost, pt-online-schema-change, Vitess's migration subsystems — where client compliance is a reasonable assumption.

Interface shape¶

Minimum viable interface:

GET /throttler/check            → 200 OK / 429 Too Many Requests
GET /throttler/check?app=...    → 200 OK / 429 Too Many Requests (scoped)

Richer shapes:

GET /throttler/check?app=vreplication&shard=shard-0
     → 429 { reason: "replication_lag", value: 7.2, threshold: 5.0 }

Scoping (app, shard, workload-class) lets the operator enable different thresholds or different metric sets for different categories of work.

Subtask-size tuning¶

Collaborative throttlers work on the check-then-work cycle: each subtask is atomic from the throttler's POV. Subtask size is tuned between:

Small enough that one subtask cannot single-handedly tank the database between checks.
Large enough that per-check overhead is amortised.

A typical choice is 100 rows per subtask for row-based migration workloads. A 10M-row migration is then 100,000 subtasks and at least 100,000 throttler checks.

Backoff policy¶

Clients typically implement:

Bounded exponential backoff on repeated rejections (1 s → 2 s → 4 s → capped at a ceiling).
Jitter to desynchronise concurrent clients after a release moment (see concepts/oversampling-metric-interval for the release-thundering-herd problem).
Check frequency somewhere around the metric's sampling interval — checking faster than the metric updates is wasted.

Non-compliance¶

A client that ignores the throttler can inflict the exact load the throttler was installed to prevent. Mitigations:

Audit logs of client check frequency and compliance.
Detection of known-misbehaving client IPs / service identifiers via unusual database load absent throttler rejection correlation.
Escalation to barrier throttling for demonstrated non- compliance.

In practice, collaborative throttlers are deployed where only known, controlled clients (internal tooling, managed migration systems) interact with the throttler. Untrusted / external clients get a barrier.

Seen in¶

sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-1 — canonical framing. Noach contrasts collaborative with barrier and notes that the discussion focuses on collaborative (which is the default for the workloads he cares about).
systems/vitess-throttler — canonical collaborative throttler in the MySQL/Vitess ecosystem. VReplication, VDiff, MoveTables, and online DDL all use the collaborative check API before each subtask.

concepts/database-throttler — parent primitive.
patterns/multi-metric-throttling — complementary pattern; check API typically fronts a multi-metric decision function.
patterns/heartbeat-based-replication-lag-measurement — the measurement mechanism whose output the check API exposes.
patterns/throttler-observability-substitute — consequence of the collaborative model: the throttler check log becomes the observability surface for throttled work.
concepts/backpressure — parent concept; check API is the explicit backpressure handshake.