CONCEPT Cited by 1 source
False-timeout rate¶
Definition¶
The false-timeout rate is the tunable fraction of requests that will be cut off by a timeout even though the downstream would have eventually succeeded. It is the explicit parameter that converts timeout sizing from guesswork into a single design decision: choose the rate, then read the corresponding timeout off the measured latency distribution.
Zalando's timeouts post formalises the workflow:
"After collecting latency metrics such as p50, p99, p99.9 you can define the so-called acceptable rate of false timeouts. Let's say you go with a false timeout rate 0.1% that means the max timeout you can set is p99.9 corresponding latency percentile on the downstream service." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)
The mapping¶
| Acceptable false-timeout rate | Set request timeout to |
|---|---|
| 10% | p90 |
| 1% | p99 |
| 0.1% | p99.9 |
| 0.01% | p99.99 |
Lower tolerance → higher percentile → larger timeout → more pool resources held when the downstream is slow. Higher tolerance → tighter timeout → faster detection of slow downstreams, but more user-visible timeout errors under normal operation.
Why making the rate explicit matters¶
A team that has not named its false-timeout target is implicitly picking one through whichever timeout number was written down at integration time. Making it explicit forces three useful conversations:
- Downstream tail shape: you need real p99 / p99.9 numbers from shadow-mode metric collection, not an SLA document. The Zalando post is emphatic: "The SLA value is good enough only for starting to test real latency."
- Caller's own SLA: if downstream p99.9 exceeds the caller's SLA, no false-timeout rate achievable by sizing alone is tolerable — the design must change (retries, hedging, fallback, removing the dependency).
- Retry trade-off: a lower timeout + retry increases load on a struggling downstream but captures users who would otherwise see a hard failure. Picking the rate forces this trade-off into the open.
The trade-off with chained calls¶
When a caller with its own SLA fans out across N sequential downstreams, a per-call 0.1% false-timeout rate is not composable — the aggregate experienced-timeout rate grows with N. This is the problem that concepts/time-budget-sharing and patterns/time-limiter-wrapping-chained-calls both solve, with different trade-offs on the per-call false-timeout rate.
Seen in¶
- sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — names false-timeout rate as the explicit tunable behind percentile-driven timeout sizing, with 0.1% → p99.9 as the worked example.
Related¶
- concepts/request-timeout — the bound this rate sizes.
- concepts/shadow-mode-metric-collection — the measurement discipline that yields trustworthy percentiles.
- concepts/time-budget-sharing — where the per-call rate gets degraded in exchange for a hard SLA.
- concepts/tail-latency-at-scale — the distributional framing for why the rate matters under fan-out.
- patterns/explicit-timeout-on-remote-calls — the rule that requires picking a rate in the first place.