CONCEPT

Request timeout¶

Definition¶

A request timeout is the maximum time a client waits for a response from a server after the TCP connection has already been established. It measures server-side work: queueing, computation, downstream fan-out, database calls, I/O.

"A request timeout, on the other hand, pertains to the maximum duration a client is willing to wait for a response from the server after a successful connection has been established. It measures the time it takes for the server to process the client's request and provide a response." (Source: )

Orthogonal to connection timeout: the two bound different phases and must be sized independently.

Sizing: driven by measured latency, not by the SLA document¶

The Zalando post prescribes a four-step sizing workflow:

Start from the downstream's SLA only as a seed for test design — not as a trustworthy timeout value. "Not all services provide SLAs and even if they do you should not trust blindly. The SLA value is good enough only for starting to test real latency."
Collect real latency metrics via shadow-mode integration: run the new dependency call in parallel to the existing production path, on a separate thread-pool, with mirrored traffic — so measurements are representative but production isn't affected. Record p50, p99, p99.9.
Pick an acceptable false-timeout rate. If 0.1% is tolerable, set the request timeout to the 99.9th percentile measured latency. Lower tolerance → higher percentile → higher timeout. Making this rate explicit turns timeout sizing from guesswork into a single tunable.
Choose between two timeout strategies:
Max timeout — set the timeout to the chosen percentile and accept the corresponding false-timeout rate.
Lower timeout + retries — deliberately cut below the tail and rely on retries to cover the cases that slip below the new, shorter ceiling. The trade-off is load amplification, so pair with a circuit breaker and exponential backoff + jitter.

The Zalando article frames this as a deliberate trade-off the caller team makes, not a default the library ships.

Chained calls: SLA budgeting¶

When a caller with its own SLA makes a sequence of downstream calls, per-call timeouts must honour the caller's SLA. Canonical example: caller SLA 1000 ms, Order p99.9 = 700 ms, Payment p99.9 = 700 ms (sequential). The two resolutions:

Split the time budget (500 ms + 500 ms): guarantees SLA compliance but produces false positives because each 500 ms is below the downstream's p99.9.
Outer time- limiter wrap (700 ms per call + 1000 ms outer limit): exploits the observation that both downstreams rarely tail simultaneously. Implemented in Java via CompletableFuture.orTimeout(…) or Resilience4j's TimeLimiter.

The outer-limiter approach is generally preferred for chains where per-call p99.9s don't fit inside a split budget.

Interaction with downstream SLAs¶

Request-timeout sizing is fundamentally an SLA-composition problem:

If the caller's SLA is below the downstream's p99.9, the caller cannot reliably meet its SLA by waiting for the downstream — some tail requests must be short-circuited via retry, fallback, or hedging.
If the caller's SLA is well above downstream p99.9, set request timeout near p99.9 and accept the rare breach.

Server-side implications¶

The Zalando post makes one consequence explicit:

"Even if the client has closed the connection, without a proper timeout configuration the request is still being processed on your side, which means that resources are busy."

A server that doesn't mirror its clients' request timeouts will continue executing work whose caller has already given up — wasted CPU, DB connections, downstream fan-out. Symmetric client / server request-timeout configuration is the full discipline; the article covers only the client side but flags the server-side cost.

Seen in¶

— canonical wiki reference for request-timeout sizing, shadow-mode measurement workflow, and chained-call budgeting.

concepts/connection-timeout — the orthogonal handshake-bounded timeout.
concepts/false-timeout-rate — the tunable that drives timeout sizing from measured percentiles.
concepts/shadow-mode-metric-collection — the measurement discipline that produces trustworthy percentiles.
concepts/time-budget-sharing — Option 1 for chained-call budgets.
concepts/tail-latency-at-scale — the distribution-shape framing behind the retry-applicability diagnostic.
concepts/fail-fast-principle — returning quickly on downstream slowness rather than pinning threads.
patterns/explicit-timeout-on-remote-calls — every remote call gets this timeout set explicitly.
patterns/time-limiter-wrapping-chained-calls — Option 2 for chained-call budgets.