Skip to content

CONCEPT Cited by 1 source

False-timeout rate

Definition

The false-timeout rate is the tunable fraction of requests that will be cut off by a timeout even though the downstream would have eventually succeeded. It is the explicit parameter that converts timeout sizing from guesswork into a single design decision: choose the rate, then read the corresponding timeout off the measured latency distribution.

Zalando's timeouts post formalises the workflow:

"After collecting latency metrics such as p50, p99, p99.9 you can define the so-called acceptable rate of false timeouts. Let's say you go with a false timeout rate 0.1% that means the max timeout you can set is p99.9 corresponding latency percentile on the downstream service." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)

The mapping

Acceptable false-timeout rate Set request timeout to
10% p90
1% p99
0.1% p99.9
0.01% p99.99

Lower tolerance → higher percentile → larger timeout → more pool resources held when the downstream is slow. Higher tolerance → tighter timeout → faster detection of slow downstreams, but more user-visible timeout errors under normal operation.

Why making the rate explicit matters

A team that has not named its false-timeout target is implicitly picking one through whichever timeout number was written down at integration time. Making it explicit forces three useful conversations:

  1. Downstream tail shape: you need real p99 / p99.9 numbers from shadow-mode metric collection, not an SLA document. The Zalando post is emphatic: "The SLA value is good enough only for starting to test real latency."
  2. Caller's own SLA: if downstream p99.9 exceeds the caller's SLA, no false-timeout rate achievable by sizing alone is tolerable — the design must change (retries, hedging, fallback, removing the dependency).
  3. Retry trade-off: a lower timeout + retry increases load on a struggling downstream but captures users who would otherwise see a hard failure. Picking the rate forces this trade-off into the open.

The trade-off with chained calls

When a caller with its own SLA fans out across N sequential downstreams, a per-call 0.1% false-timeout rate is not composable — the aggregate experienced-timeout rate grows with N. This is the problem that concepts/time-budget-sharing and patterns/time-limiter-wrapping-chained-calls both solve, with different trade-offs on the per-call false-timeout rate.

Seen in

Last updated · 550 distilled / 1,221 read