CONCEPT Cited by 1 source

Connection timeout¶

Definition¶

A connection timeout is the maximum time a client waits while attempting to establish a network connection with a server. For TCP-based traffic this means completing the three-way handshake (SYN / SYN-ACK / ACK) before the operation is declared a failure.

"A connection timeout refers to the maximum amount of time a client is willing to wait while attempting to establish a connection with a server. It measures the time it takes for a client to successfully establish a network connection with a server." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)

A connection timeout only applies during connection establishment. Once the TCP connection is up, subsequent bytes are governed by the request timeout — a separate, differently-sized bound.

What actually causes a connection timeout¶

A connection timeout fires when the initial handshake does not complete. Common root causes named by the Zalando post:

The remote host is down or the process has crashed.
Wrong IP / DNS name.
Wrong port.
Network connectivity to the server is down (intermediate router failure, routing change).
A firewall or security group silently drops SYN packets rather than rejecting them (the remote endpoint's policy is to blackhole certain traffic).

These are all "the server never answered the door" conditions, structurally distinct from "the server accepted my request but is taking a long time to respond."

Sizing: derive from RTT, not from operation latency¶

The key discipline is:

"A connection timeout should be sufficient to complete this process [the three-way handshake] and the actual transmission of packets is gated by the quality of the connection."

Therefore the connection timeout should be derived from the network round-trip time between client and server, not from how long the downstream operation takes.

Typical numbers cited in the post:

Same data-centre / same AWS region: sub-millisecond RTT.
NYC ↔ SF on fibre: ~42 ms RTT.
NYC ↔ Sydney: ~160 ms RTT.
Mobile client ↔ remote region: tens to hundreds of ms, highly variable.

Same-DC traffic warrants a short connection timeout (single- to-double-digit ms); mobile or cross-continent traffic needs a wider margin.

The RTT × 3 heuristic¶

The post canonicalises Connection timeout = RTT × 3 as a conservative default:

"You can set up a connection timeout which is some multiple of your expected RTT. Connection timeout = RTT × 3 is commonly used as a conservative approach, but you can adjust it based on your specific needs."

Covers the three-way handshake (one RTT of wire time plus slack) with enough margin to absorb transient jitter and service-startup race conditions.

Common antipattern: sizing connection timeout to match request timeout¶

A widespread mistake — explicitly called out in the Zalando post — is to set connection timeout ≈ request timeout, on the belief that a single "socket timeout" is simpler:

"A common practice for microservices is to set a connection timeout equal to or slightly lower than the timeout for the operation. This approach may not be ideal since the two processes are different. Whereas establishing a connection is a relatively quick process, an operation can take hundreds or thousands of ms!"

Sizing them together blurs the distinction between "cannot reach the server" (short, retryable via DNS/routing/failover) and "the server is overloaded" (long, retryable with a circuit breaker). Separate timeouts preserve diagnostic signal.

Trade-offs the timeout value encodes¶

Too low → false failures during a brief network blip, service startup, or TLS renegotiation.
Too high → the caller holds its outbound socket + thread
upstream connection while a dead peer doesn't respond; repeated failures drain the caller's pools (see concepts/thread-pool-exhaustion).

Zalando's summary: "the connection timeout for a microservice should be set low enough so that it can quickly detect an unreachable service, but high enough to allow the service to start up or recover from a short-lived problem."

Seen in¶

sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — the canonical deep-dive on connection vs. request timeouts; provides the RTT × 3 heuristic and the RTT reference numbers.

concepts/request-timeout — the post-handshake companion timeout for server-side work.
concepts/tcp-three-way-handshake — the protocol-level event this timeout bounds.
concepts/round-trip-time-rtt — the network property this timeout is sized from.
concepts/fail-fast-principle — the design principle behind setting tight, explicit timeouts.
concepts/thread-pool-exhaustion — the failure mode that too-high connection timeouts eventually produce.
patterns/connection-timeout-rtt-times-three — the sizing heuristic.
patterns/explicit-timeout-on-remote-calls — the broader rule: every remote call gets explicit timeouts.