PATTERN Cited by 1 source
Explicit timeout on remote calls¶
Pattern¶
Set both the connection timeout and the request timeout explicitly on every remote call. Never rely on library defaults; never leave a timeout unset on the theory that "it probably won't matter." Every outbound HTTP, gRPC, database, and RPC client gets both numbers written down, checked into source, and reviewed alongside the downstream it targets.
Zalando's timeouts post names this as the house-style rule:
"The default timeout is your enemy, always set timeouts explicitly!" (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)
Motivation¶
Library defaults are chosen for maximum compatibility, not for production use. The Zalando post cites the canonical anti-example:
"For example for native java HttpClient the default connection/request timeout is infinite, which is unlikely within your SLA :)"
Other common offenders:
- libcurl default connect-timeout of 300 s.
- Java URLConnection default of infinite on most JVMs.
- Many database drivers default to no query timeout.
- requests.get(url) in Python has no default timeout.
A service inheriting any of these will, eventually, be taken out by a single stuck downstream: see concepts/thread-pool-exhaustion and concepts/connection-pool-exhaustion.
Shape¶
Two timeouts, sized independently:
- Connection timeout — bounded by network RTT; commonly RTT × 3.
- Request timeout — bounded by measured downstream latency percentiles via shadow-mode metric collection, sized to the chosen false-timeout rate.
Sizing the two together is a named anti-pattern: connection establishment and downstream work are bounded by different physical processes and should be configured separately.
Enforcement¶
- Code review: every new outbound client gets a timeout pair explicitly in the reviewer's checklist.
- Static analysis: lint rules flag client constructions that omit timeout configuration.
- Integration tests: test that a dead downstream causes a fast failure (bounded wall-clock) rather than hanging.
- Template / scaffolding: service templates ship with opinionated timeouts already set so new services inherit the discipline.
Trade-offs¶
Explicit timeouts surface latency decisions into code-review conversations rather than leaving them as accidental consequences of library defaults. The cost is that timeout numbers must be maintained — when a downstream's real p99.9 shifts, the caller's timeout must be re-visited.
Related¶
- concepts/connection-timeout / concepts/request-timeout — the two bounds this pattern requires.
- concepts/fail-fast-principle — the design principle motivating the pattern.
- concepts/thread-pool-exhaustion — the failure mode unbounded waits produce.
- patterns/connection-timeout-rtt-times-three — the sizing heuristic for the connection half.
- patterns/time-limiter-wrapping-chained-calls — the extension for chained calls.
- patterns/retry-on-5xx-not-4xx / patterns/circuit-breaker — companions for what to do when a timeout fires.
Seen in¶
- sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — canonical wiki home. Zalando house-style rule.