Skip to content

CONCEPT Cited by 1 source

Virtual-thread pinning

Virtual-thread pinning is the Java 21 failure mode where a virtual thread (VT) cannot unmount from its carrier OS thread while blocked. The VT holds the carrier through the entire blocking region — defeating the core VT guarantee that blocking operations release the OS thread for other work.

The canonical trigger

From the JDK 21 core docs:

"a VT will be pinned to the underlying OS thread if it performs a blocking operation while inside a synchronized block or method."

Also pins: - Native-frame blocking (JNI calls that acquire monitors). - Object.wait() on a monitor (same root cause as synchronized).

Notably does not pin: - ReentrantLock.lock() / LockSupport.park() / Future.get() outside a synchronized region. - Blocking I/O outside synchronized.

Why synchronized in particular

synchronized compiles to JVM monitor operations (monitorenter / monitorexit) that are tracked at the JVM frame level. The VT runtime's continuation-capture machinery does not currently know how to save/restore monitor state safely across an unmount, so it just keeps the VT mounted.

How pinning becomes fatal

On a machine with N carrier threads (default: N = vCPU count), N simultaneously pinned VTs exhaust all carriers. Once exhausted:

  • The JVM can still create new VTs (each is just a task record + heap-allocated continuation).
  • New VTs cannot mount, because every carrier is held by a pinned VT.
  • Pinned VTs cannot progress, because whatever they're waiting on (lock, I/O) requires another VT to run — and no carriers are available.

The result is a starvation deadlock: strictly speaking not a cycle, but operationally indistinguishable from one.

Symptom pattern

Netflix's 2024-07-29 production instance:

  • Sockets pile up in closeWait (Tomcat accepts, creates request-VT, but the VT can't mount to run the close path).
  • jcmd Thread.dump_to_file shows thousands of "blank" VTs: thread objects exist, no stack trace, never started. ~1:1 with closeWait count.
  • Only 4 VTs with stack traces, all on the synchronized-wrapped path. All 4 blocked on the same lock.
  • No JVM monitoring signal that the pool is "exhausted" — appears as "process is hung" from the outside.

Contrast with classic deadlock

This is distinct from classic deadlock:

  • Classic deadlock: cyclic wait among N lock holders.
  • VT-pinning deadlock: N VTs waiting on a single lock, but holding all carrier threads, blocking the would-be holder of the lock (or its refresher) from running.

The observable symptom (process hangs, lock waiters in the dump) looks similar, but the remediation is different — there's no cycle to break, you have to free a carrier.

Prevention

  • Don't call blocking APIs inside synchronized.
  • Replace synchronized with ReentrantLock in hot paths that transit tracing / logging / any library you don't control — ReentrantLock is VT-aware and releases the carrier during its own parking.
  • Audit transitive paths, not just direct calls. The Netflix case was inside Micrometer Tracing's bridge, not the user's synchronized block — library code inside your own synchronized is still pinning-hazardous.
  • Monitor for pinning: JDK 21 supports -Djdk.tracePinnedThreads=full or -Djdk.tracePinnedThreads=short to log pin events.
  • Size the carrier pool larger than default is not a fix — it raises the pinning threshold but doesn't eliminate the class.

JDK roadmap

The structural fix is to make VTs unmount across synchronized the same way they do across ReentrantLock. See JEP 491 — targeted at post-Java 21 releases. Until that lands and is adopted widely, pinning remains an operational hazard every VT-adopting service must audit for. Netflix canonicalises this as a language-runtime upstream-fix instance on the wiki.

Seen in

  • sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock — Canonical wiki instance. 4 virtual threads pinned inside synchronized on the Brave span-finish path exhausted all 4 carrier threads on 4-vCPU Netflix microservice instances, causing fleet-wide starvation deadlock with closeWait socket pile-up. The lock owner (AsyncReporter flusher) could not reacquire the lock after Condition.awaitNanos because AQS's FIFO queue placed it behind the pinned VTs.
Last updated · 319 distilled / 1,201 read