CONCEPT Cited by 1 source
closeWait socket state¶
CLOSE_WAIT is a TCP endpoint state: the remote peer has sent
a FIN (it closed its side), but the local application has
not yet called close() (or shutdown(SHUT_WR)) on the
socket. The kernel is waiting on the application to finish
reading + close.
Why a pile-up is diagnostic¶
"Sockets remaining in
closeWaitstate indicate that the remote peer closed the socket, but it was never closed on the local instance, presumably because the application failed to do so. This can often indicate that the application is hanging in an abnormal state, in which case application thread dumps may reveal additional insight." (Source: sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock)
A growing CLOSE_WAIT count is a leading indicator of
application-layer hangs — the kernel's TCP state machine is
healthy, but the application handling socket lifecycle is not.
Observability¶
ss -tan state close-wait— Linuxssshows counts per state.netstat -ant | grep CLOSE_WAIT— classic./proc/<pid>/fd/— enumerate the process's file descriptors.- Cloud-native: most observability platforms surface per-state TCP socket counts as a host metric.
The Netflix 2024-07-29 shape¶
The Netflix incident produced a 1:1 correspondence between
closeWait socket count and "blank virtual thread" count in
the jcmd dump:
- Tomcat accepts a connection.
- Creates a request virtual thread.
- Passes it to the VT executor.
- The VT is scheduled as a task — but can't mount, because
the carrier threads are pinned
by earlier VTs inside
synchronizedblocks. - The VT sits as a blank task in the queue, still holding the accepted socket.
- The downstream client times out and closes.
- The socket enters
CLOSE_WAIT— and stays there until the JVM restarts.
The CLOSE_WAIT count grows linearly with incoming-request
rate until the application is bounced.
What it is NOT¶
CLOSE_WAITis not a half-open connection.- It does not imply a kernel bug.
- It is almost always an application bug — either:
(a) the application is deadlocked and never gets to the
close path;
(b) the application forgot to call
close()on the error path; (c) a framework's connection-pool logic has a bug.
The Netflix case is (a) at the framework level — Tomcat's request-handling code path does include close logic, but the VT that would run it cannot mount.
Seen in¶
- sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock
— Canonical wiki instance of
CLOSE_WAITpile-up as the external symptom of VT carrier-thread exhaustion. The growingCLOSE_WAITcount was Netflix's first-order detection signal.
Related¶
- systems/embedded-tomcat — The accepting application stack.
- concepts/virtual-thread-pinning — The root cause that produces the pile-up.
- concepts/carrier-thread — The resource whose exhaustion stops the close path from running.
- companies/netflix — Canonical production instance.