CONCEPT Cited by 1 source

False-timeout rate¶

Definition¶

The false-timeout rate is the tunable fraction of requests that will be cut off by a timeout even though the downstream would have eventually succeeded. It is the explicit parameter that converts timeout sizing from guesswork into a single design decision: choose the rate, then read the corresponding timeout off the measured latency distribution.

Zalando's timeouts post formalises the workflow:

"After collecting latency metrics such as p50, p99, p99.9 you can define the so-called acceptable rate of false timeouts. Let's say you go with a false timeout rate 0.1% that means the max timeout you can set is p99.9 corresponding latency percentile on the downstream service." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)

The mapping¶

Acceptable false-timeout rate	Set request timeout to
10%	p90
1%	p99
0.1%	p99.9
0.01%	p99.99

Lower tolerance → higher percentile → larger timeout → more pool resources held when the downstream is slow. Higher tolerance → tighter timeout → faster detection of slow downstreams, but more user-visible timeout errors under normal operation.

Why making the rate explicit matters¶

A team that has not named its false-timeout target is implicitly picking one through whichever timeout number was written down at integration time. Making it explicit forces three useful conversations:

Downstream tail shape: you need real p99 / p99.9 numbers from shadow-mode metric collection, not an SLA document. The Zalando post is emphatic: "The SLA value is good enough only for starting to test real latency."
Caller's own SLA: if downstream p99.9 exceeds the caller's SLA, no false-timeout rate achievable by sizing alone is tolerable — the design must change (retries, hedging, fallback, removing the dependency).
Retry trade-off: a lower timeout + retry increases load on a struggling downstream but captures users who would otherwise see a hard failure. Picking the rate forces this trade-off into the open.

The trade-off with chained calls¶

When a caller with its own SLA fans out across N sequential downstreams, a per-call 0.1% false-timeout rate is not composable — the aggregate experienced-timeout rate grows with N. This is the problem that concepts/time-budget-sharing and patterns/time-limiter-wrapping-chained-calls both solve, with different trade-offs on the per-call false-timeout rate.

Seen in¶

sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — names false-timeout rate as the explicit tunable behind percentile-driven timeout sizing, with 0.1% → p99.9 as the worked example.

concepts/request-timeout — the bound this rate sizes.
concepts/shadow-mode-metric-collection — the measurement discipline that yields trustworthy percentiles.
concepts/time-budget-sharing — where the per-call rate gets degraded in exchange for a hard SLA.
concepts/tail-latency-at-scale — the distributional framing for why the rate matters under fan-out.
patterns/explicit-timeout-on-remote-calls — the rule that requires picking a rate in the first place.