CONCEPT Cited by 1 source
Throttler fail-open vs fail-closed¶
Definition¶
When a throttler is unreachable — crashed, partitioned, restarting — clients must decide locally whether to proceed with full power (fail-open) or to consider themselves rejected (fail-closed). The decision is made at the client because there is no throttler to consult.
"If no throttler is available, should clients consider themselves rejected, or should they proceed with full power?"
— Shlomi Noach, Source: sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2
The standard pragmatic middle¶
Neither pure fail-open nor pure fail-closed works well in practice. The canonical compromise is a bounded-wait strategy:
"A common approach is to hold off up to some timeout and, from there on, proceed unthrottled, taking into consideration the possibility that the throttler may not be up for a while."
Structure:
- On throttler unreachable → retry for
T_waitseconds. - After
T_wait→ fail-open: proceed with the workload unthrottled.
T_wait is the tunable: long enough to ride through a
restart or brief partition, short enough that a long outage
doesn't stall foreground work.
When to lean fail-open¶
- Workloads where stopping the client is worse than momentarily overloading the protected resource.
- Background-job workers whose backlog will grow unboundedly if they halt.
- Systems with independent resource protection downstream (e.g. database connection pools, admission control in the data-plane itself).
When to lean fail-closed¶
- Workloads where the throttler guards a catastrophic side-effect (writes to a shared replication topology with cascading-lag risk).
- Workloads where caller retry is cheap and safe.
- Systems without any secondary protection — the throttler is the only backstop.
Interaction with throttler hibernation¶
Hibernation stretches the fail-open decision into a normal-operation concern: the first checks after idle see stale data and may wrongly reject; a fail-closed client strategy forces extra retries; a fail-open strategy risks a burst on stale data. Vitess's design assumes client retry as the compensation either way.
Seen in¶
- sources/2026-04-21-planetscale-anatomy-of-a-throttler-part-2 — canonical wiki introduction. Shlomi Noach frames the decision as unavoidable in any HA throttler deployment and describes the bounded-wait-then-fail-open pragmatic compromise.