Skip to content

CONCEPT Cited by 1 source

closeWait socket state

CLOSE_WAIT is a TCP endpoint state: the remote peer has sent a FIN (it closed its side), but the local application has not yet called close() (or shutdown(SHUT_WR)) on the socket. The kernel is waiting on the application to finish reading + close.

Why a pile-up is diagnostic

"Sockets remaining in closeWait state indicate that the remote peer closed the socket, but it was never closed on the local instance, presumably because the application failed to do so. This can often indicate that the application is hanging in an abnormal state, in which case application thread dumps may reveal additional insight." (Source: sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock)

A growing CLOSE_WAIT count is a leading indicator of application-layer hangs — the kernel's TCP state machine is healthy, but the application handling socket lifecycle is not.

Observability

  • ss -tan state close-wait — Linux ss shows counts per state.
  • netstat -ant | grep CLOSE_WAIT — classic.
  • /proc/<pid>/fd/ — enumerate the process's file descriptors.
  • Cloud-native: most observability platforms surface per-state TCP socket counts as a host metric.

The Netflix 2024-07-29 shape

The Netflix incident produced a 1:1 correspondence between closeWait socket count and "blank virtual thread" count in the jcmd dump:

  • Tomcat accepts a connection.
  • Creates a request virtual thread.
  • Passes it to the VT executor.
  • The VT is scheduled as a task — but can't mount, because the carrier threads are pinned by earlier VTs inside synchronized blocks.
  • The VT sits as a blank task in the queue, still holding the accepted socket.
  • The downstream client times out and closes.
  • The socket enters CLOSE_WAIT — and stays there until the JVM restarts.

The CLOSE_WAIT count grows linearly with incoming-request rate until the application is bounced.

What it is NOT

  • CLOSE_WAIT is not a half-open connection.
  • It does not imply a kernel bug.
  • It is almost always an application bug — either: (a) the application is deadlocked and never gets to the close path; (b) the application forgot to call close() on the error path; (c) a framework's connection-pool logic has a bug.

The Netflix case is (a) at the framework level — Tomcat's request-handling code path does include close logic, but the VT that would run it cannot mount.

Seen in

Last updated · 319 distilled / 1,201 read