PATTERN Cited by 1 source
Circuit breaker¶
Pattern¶
A circuit breaker wraps a call to a failure-prone dependency with a state machine that trips open when the dependency is failing at a threshold rate, short-circuits subsequent calls for a cooldown window, then probes with a limited number of requests before fully closing again. When open, calls to the dependency return an immediate failure (or fallback) rather than attempting the remote call — capacity at the caller is preserved, and the dependency is not piled on while it is already struggling.
Zalando's timeouts post names the pattern as a mandatory companion to retries:
"Circuit breaker: always consider implementing circuit breakers when enabling retry. When failures are rare, that's not a problem. Retries that increase load can make matters significantly worse." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)
State machine¶
┌────────────┐ error rate > threshold ┌────────────┐
│ CLOSED │ ────────────────────────▶ │ OPEN │
│ (normal) │ │ (fail-fast)│
└────────────┘ └────────────┘
▲ │
│ │ cooldown elapsed
│ success probe ▼
│ ┌──────────────┐
└───────────────────────────────────│ HALF-OPEN │
│ (probing) │
└──────────────┘
probe fails │
┌──────────────────────────────────────────┘
▼
(back to OPEN)
Closed. Normal operation. Failures are counted against a rolling window.
Open. Once failures exceed the threshold (e.g. 50% of the last 20 requests, or N consecutive failures), the breaker trips and all subsequent calls return immediately without invoking the dependency. No timeouts consumed, no pool threads held, no additional load on the downstream.
Half-open. After a cooldown (e.g. 30 s), the breaker allows a limited number of probe requests through. If they succeed, the breaker closes. If any fail, it goes back to open.
Why retry + circuit breaker beats either alone¶
Retry alone during a real downstream outage amplifies load: every failing request generates N retry attempts, multiplying traffic to the already-struggling downstream. Recovery stalls.
Circuit breaker alone produces hard errors during transient blips that a retry would have smoothed over.
Retry + circuit breaker: retry covers transient failures inside the breaker's threshold (closed state); breaker takes over once failures are sustained (open state). The breaker also caps the damage a runaway retry policy can do to the downstream.
Parameters that matter¶
- Failure threshold — error rate or count that trips the breaker. Too low: trips on transient noise. Too high: cascading failure before it trips.
- Rolling window — time or request-count window over which failures are measured.
- Cooldown / open state duration — how long to wait before probing. Must allow the downstream actual recovery time.
- Probe volume — number of half-open probes before fully closing; too few gives a false-recovery signal, too many hammers a still-broken downstream.
- Per-dependency granularity — a single service-wide breaker is coarse; per-downstream (or per-endpoint) breakers isolate failures properly.
Fallbacks¶
The breaker's open state is the natural moment to return a fallback:
- Cached last-known-good value.
- Default / static response.
- Degraded feature (e.g. return recommendations without personalisation).
- Hard failure surfaced to user with a "try again later" message.
A circuit breaker with no fallback is still useful — it caps the blast radius of a cascading failure — but a breaker with a fallback is how services degrade gracefully under dependency outages. See concepts/graceful-degradation.
Implementations¶
- Resilience4j — canonical JVM
implementation with
CircuitBreakermodule composing withRetry,TimeLimiter,Bulkhead,RateLimiter. - Hystrix (deprecated / archived by Netflix, 2018) — the original popular JVM implementation; superseded by Resilience4j.
- Polly (.NET), gobreaker (Go), and many in-service implementations.
Service meshes (Envoy, Istio, Linkerd) also implement circuit breakers at the proxy layer, externalising the logic from application code.
Related¶
- concepts/cascading-failure — the failure mode this pattern contains.
- concepts/thread-pool-exhaustion — the specific cascading-failure mechanism the breaker prevents by short-circuiting.
- concepts/fail-fast-principle — the design principle the breaker implements at the dependency level.
- concepts/exponential-backoff-jitter — complementary retry-scheduling discipline.
- concepts/graceful-degradation — the response-side companion once a breaker is open.
- patterns/retry-on-5xx-not-4xx — the retry policy the breaker makes safe.
- patterns/explicit-timeout-on-remote-calls — the broader resilience discipline.
- systems/resilience4j — canonical implementation.
Seen in¶
- sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — canonical wiki home. Zalando house-style mandatory companion to retry.