CONCEPT Cited by 1 source

Round-trip time (RTT)¶

Definition¶

Round-trip time (RTT) is the time it takes for a signal to travel from sender to receiver and back across a network path. It is the foundational latency measurement from which many networking-timeout values — notably the connection timeout — are derived.

RTT has a physical floor set by the speed of light in the transmission medium, plus contributions from every device and queue on the path:

Propagation delay (dominant on long paths).
Serialisation delay (bytes onto wire at the interface rate).
Queuing and forwarding delay at each hop.
Processing / ACK-scheduling delay at the endpoints.

Reference numbers¶

The Zalando timeouts post cites these canonical RTTs as calibration anchors:

Same data-centre / same AWS region: sub-millisecond.
NYC ↔ SF on fibre: ~42 ms.
NYC ↔ Sydney: ~160 ms.
Author's machine → recommended AWS Region: 28 ms (measured via the public AWS WorkSpaces Connection Health Check UI).

These are propagation-dominated paths. Mobile clients, VPN tunnels, and congested links produce substantially higher and more variable RTTs.

Why RTT drives connection-timeout sizing¶

The three-way handshake takes approximately one RTT to complete. Setting the connection timeout to a small multiple of RTT — Zalando canonicalises RTT × 3 — gives enough margin to absorb transient jitter and service-startup delay without pinning the caller on a dead peer.

Setting connection timeouts without regard for RTT is a common antipattern: a same-DC caller with a 30-second connection timeout will spend 30 seconds on each unreachable peer when 10 ms would suffice; a mobile caller with a 50 ms connection timeout will produce constant false failures.

RTT vs. operation latency¶

RTT is a network property; operation latency is a server-plus-network property. The distinction is why the Zalando post insists that connection timeout and request timeout must be sized independently:

Connection timeout: derived from RTT (network quality).
Request timeout: derived from measured server-side latency percentiles (p99, p99.9).

Conflating them loses the signal of "the network is fine but the server is slow" vs. "the server is unreachable."

Measurement¶

RTT can be probed with:

ping (ICMP echo).
TCP SYN + SYN-ACK timing.
HTTP HEAD against a known-lightweight endpoint.
Cloud-provider health-check UIs (AWS WorkSpaces Connection Health Check, etc.) — good for anchoring end-user-side RTT when building a client-focused service.

Seen in¶

sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — anchors the RTT × 3 connection-timeout heuristic with concrete RTT numbers.

concepts/connection-timeout — the timeout sized from RTT.
concepts/tcp-three-way-handshake — the RTT-bounded event that connection timeouts cover.
concepts/tail-latency-at-scale — wider framing of latency-percentile management.
patterns/connection-timeout-rtt-times-three — the sizing heuristic itself.