CONCEPT Cited by 1 source

Automated circuit breaker with partial-open recovery state¶

Definition¶

The automated circuit breaker with partial-open recovery state is a refinement of the classical circuit breaker specifically tuned for LLM-serving routing layers. It introduces a fourth state between OPEN and CLOSED — a partial-open ramp that allows a "controlled trickle" of requests to reach a recovering endpoint, then dynamically expands the trickle as the endpoint demonstrates sustained health, until the breaker is fully CLOSED.

The wiki canonical framing comes from Slack's 2026-05-28 multi-cloud retrospective (Source: sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud):

"To move beyond manual failovers, we implemented an automated Circuit Breaker pattern. This system acts as a real-time watchdog, constantly monitoring health signals at the endpoint level. If a specific provider or model begins to exhibit signs of distress – such as an elevated Time to First Token (TTFT), a spike in 5xx error rates, or crossing a latency p90 threshold – the circuit 'trips.' Once tripped, the routing layer automatically diverts traffic to a healthy alternative model based on the use case and complexity. Crucially, the breaker enters a partial-open state, allowing a small, controlled trickle of requests to reach the degraded endpoint. As the endpoint demonstrates sustained health, the system dynamically expands this trickle, incrementally ramping traffic back up until the breaker is fully 'closed' and normal operations resume. This ensures a graceful recovery without overwhelming a stabilizing service."

Three named health signals¶

Slack discloses three real-time signals that trip the breaker:

Time to First Token (TTFT) — the LLM-specific latency primitive. Captures both queuing delay and prefill cost.
5xx error rate spike — classical server-error rate threshold.
p90 latency threshold — distribution-aware signal that doesn't wait for full request failures, just for the distribution to shift.

The combination is LLM-specific — TTFT is not a useful signal for classical RPC services, and p90-latency is more informative than mean-latency for LLM workloads with high inherent variance.

Why partial-open recovery matters¶

The classical circuit breaker has only OPEN / HALF-OPEN / CLOSED. The HALF-OPEN state typically allows a single probe (or small fixed number) before deciding to close. For LLM serving, this is too coarse:

Cold-start surge — fully closing the breaker means routing 100% of traffic back to a recovering endpoint instantly, which can re-saturate the same shared pool that just degraded.
Recovery is gradual — the recovering endpoint's capacity comes back over time as backpressure clears, not in one step. Routing traffic at the same speed-up shape avoids re-tripping.
Inference engines have warmup state — KV caches, speculative-decoding draft models, prefix-cache locality — all benefit from a gradual ramp rather than a step function.

The partial-open ramp generalises HALF-OPEN into a continuous-state recovery profile.

Mechanism (sketched from Slack disclosure)¶

Slack does not disclose specific numerical thresholds, but the mechanism shape is clear:

Endpoint state machine:

CLOSED              ──[health degrades]──▶  OPEN
  │                                          │
  │                                  [cooldown]
  │                                          ▼
  │                                  PARTIAL-OPEN
  │                                  (trickle starts)
  │                                          │
  │                                  [sustained health]
  │                                          ▼
  │                                  PARTIAL-OPEN
  │                                  (trickle expands)
  │                                          │
  │                                  [sustained health]
  │                                          ▼
  └──────[fully recovered]───────── CLOSED

The dynamic expansion of the trickle distinguishes the pattern from classical HALF-OPEN, which has a single probe rather than a ramp.

Composition with neighbouring concepts¶

Concept	Relationship
patterns/circuit-breaker (classical)	Direct refinement: adds the partial-open ramp state.
concepts/thundering-herd	The failure mode the partial-open ramp specifically prevents at recovery time.
concepts/cascading-failure	Sibling concern: gradual ramp prevents the recovering endpoint from re-cascading.
concepts/multi-cloud-llm-serving	The architectural posture the breaker operates inside.
concepts/api-normalization-multi-cloud-llm	The breaker requires unified error / health signals across providers, which the normalisation layer provides.
concepts/concentration-risk-single-cloud-llm	The risk the breaker mitigates by routing across providers.
HAProxy / NGINX active health check	Sibling primitive at the load-balancer altitude. The breaker operates one altitude up — at the cross-provider routing layer — but follows the same general shape.

What the breaker does NOT do¶

Fix the underlying degradation — the breaker reduces customer-visible damage during degradation but doesn't diagnose or repair the cause.
Provide global capacity — if all endpoints are degraded, the breaker has nowhere to route to. Slack's model fallback hierarchy composes the breaker with multiple designated backup models per feature.
Replace per-request retry / backoff — the breaker operates at endpoint granularity; per-request resilience remains the responsibility of retry logic.

Open questions (from Slack's disclosure)¶

Specific TTFT / p90 / 5xx-rate thresholds that trip the breaker — not disclosed.
Cooldown duration before partial-open begins — not disclosed.
Trickle ramp rate — linear, exponential, custom curve? Not disclosed.
What "sustained health" means quantitatively — duration
below-threshold continuity required for ramp expansion.
Per-endpoint vs per-feature granularity — does the breaker decision apply per-endpoint or per-feature × endpoint?

Seen in¶

sources/2026-05-28-slack-slack-ai-the-path-to-multi-cloud — canonical wiki disclosure of the automated circuit breaker with partial-open recovery state as one of five subsystems inside Slack's Intelligent Routing Layer. Verbatim TTFT + p90 + 5xx triggers + partial-open trickle + dynamic expansion + sustained-health-driven ramp-back framing.