Skip to content

PATTERN Cited by 1 source

Circuit breaker

Pattern

A circuit breaker wraps a call to a failure-prone dependency with a state machine that trips open when the dependency is failing at a threshold rate, short-circuits subsequent calls for a cooldown window, then probes with a limited number of requests before fully closing again. When open, calls to the dependency return an immediate failure (or fallback) rather than attempting the remote call — capacity at the caller is preserved, and the dependency is not piled on while it is already struggling.

Zalando's timeouts post names the pattern as a mandatory companion to retries:

"Circuit breaker: always consider implementing circuit breakers when enabling retry. When failures are rare, that's not a problem. Retries that increase load can make matters significantly worse." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)

State machine

    ┌────────────┐  error rate > threshold   ┌────────────┐
    │   CLOSED   │ ────────────────────────▶ │    OPEN    │
    │  (normal)  │                            │ (fail-fast)│
    └────────────┘                            └────────────┘
          ▲                                         │
          │                                         │ cooldown elapsed
          │ success probe                           ▼
          │                                   ┌──────────────┐
          └───────────────────────────────────│  HALF-OPEN   │
                                              │  (probing)   │
                                              └──────────────┘
                 probe fails                        │
          ┌──────────────────────────────────────────┘
    (back to OPEN)

Closed. Normal operation. Failures are counted against a rolling window.

Open. Once failures exceed the threshold (e.g. 50% of the last 20 requests, or N consecutive failures), the breaker trips and all subsequent calls return immediately without invoking the dependency. No timeouts consumed, no pool threads held, no additional load on the downstream.

Half-open. After a cooldown (e.g. 30 s), the breaker allows a limited number of probe requests through. If they succeed, the breaker closes. If any fail, it goes back to open.

Why retry + circuit breaker beats either alone

Retry alone during a real downstream outage amplifies load: every failing request generates N retry attempts, multiplying traffic to the already-struggling downstream. Recovery stalls.

Circuit breaker alone produces hard errors during transient blips that a retry would have smoothed over.

Retry + circuit breaker: retry covers transient failures inside the breaker's threshold (closed state); breaker takes over once failures are sustained (open state). The breaker also caps the damage a runaway retry policy can do to the downstream.

Parameters that matter

  • Failure threshold — error rate or count that trips the breaker. Too low: trips on transient noise. Too high: cascading failure before it trips.
  • Rolling window — time or request-count window over which failures are measured.
  • Cooldown / open state duration — how long to wait before probing. Must allow the downstream actual recovery time.
  • Probe volume — number of half-open probes before fully closing; too few gives a false-recovery signal, too many hammers a still-broken downstream.
  • Per-dependency granularity — a single service-wide breaker is coarse; per-downstream (or per-endpoint) breakers isolate failures properly.

Fallbacks

The breaker's open state is the natural moment to return a fallback:

  • Cached last-known-good value.
  • Default / static response.
  • Degraded feature (e.g. return recommendations without personalisation).
  • Hard failure surfaced to user with a "try again later" message.

A circuit breaker with no fallback is still useful — it caps the blast radius of a cascading failure — but a breaker with a fallback is how services degrade gracefully under dependency outages. See concepts/graceful-degradation.

Implementations

  • Resilience4j — canonical JVM implementation with CircuitBreaker module composing with Retry, TimeLimiter, Bulkhead, RateLimiter.
  • Hystrix (deprecated / archived by Netflix, 2018) — the original popular JVM implementation; superseded by Resilience4j.
  • Polly (.NET), gobreaker (Go), and many in-service implementations.

Service meshes (Envoy, Istio, Linkerd) also implement circuit breakers at the proxy layer, externalising the logic from application code.

Seen in

Last updated · 550 distilled / 1,221 read