PATTERN Cited by 1 source
Diagnose via heap-dump lock introspection¶
Problem¶
Thread dumps tell you who is waiting for what, but not always who holds what. Several failure modes make the thread dump silent about the lock owner:
- The primary dumping tool drops the metadata. In Java 21,
jcmd Thread.dump_to_filedoes not include- locked/Locked ownable synchronizersentries — no lock-owner information in VT-capable thread dumps. - The "owner" does not actually exist. When a lock is
corrupted (stale AQS state, bitwise double-free on the lock
word) or when the owner has released via a
Condition.awaitNanospath and is requeued, no thread shows as the current owner. - The thread dump is sampled at a moment where the owner is in an uninstrumented transition (between release and requeue).
Without the owner, you can't reason about why waiters don't progress.
Pattern¶
When the thread dump is exhausted as an evidence source:
- Capture a heap dump of the same process.
jcmd <pid> GC.heap_dump /path/heap.hproffor JVMs; core dumps viagcore/gdb/kill -SIGQUITfor native runtimes. - Identify the lock object on the heap by walking from a known waiter's stack-local references. Eclipse MAT's "inspector" pane is excellent for this.
- Read the lock's internal state fields directly — the
AQS state word,
exclusiveOwnerThread, waiter queue pointers — and cross-reference with the thread dump's thread IDs. - Reverse-engineer what you need against the lock
implementation source code (Java:
AbstractQueuedSynchronizer.java; Rust:parking_lot's raw_rwlock.rs; etc.). You don't need to understand every line — just enough to interpret the observed state.
Why it works¶
- All lock state is data. Every synchronization primitive
represents its state as memory that can be read, however
clever the encoding (bitpacked words, AQS queue links,
per-thread
ParkBlockerreferences). - Heap dumps are complete snapshots. Thread dumps are metadata about threads; heap dumps are state about the actual runtime objects, including the locks.
- Cross-referencing with the thread dump turns an ambiguous scene into a fully reconstructed one: which threads are in the waiter queue, in what order, with what stack traces.
Canonical wiki instance — Netflix 2024-07-29¶
Netflix had a JVM hung with
pinned virtual threads.
Thread dump:
- 4 VTs blocked on ReentrantLock.lock inside
synchronized on the Brave span-finish path.
- 1 more VT blocked on the same lock via a different
(non-synchronized) path.
- 1 platform thread (AsyncReporter flusher) blocked on
the same lock, in the AQS.acquire post-awaitNanos
reacquire path.
- No thread showing as lock owner.
The heap dump, inspected via Eclipse MAT, revealed:
- The ReentrantLock's AQS state shows no
exclusiveOwnerThread.
- The AQS waiter queue contained all 6 threads, in a FIFO
order that put the flusher behind the pinned VTs.
- The Condition's internal queue confirmed the flusher's
recent awaitNanos release.
Interpretation: the flusher had the lock, released via
awaitNanos, timed out, and was queued behind the already-
waiting pinned VTs. The pinned VTs can't release their
carriers (they're pinned) so can't be the next acquirer.
The flusher is behind them in FIFO order. Starvation
deadlock — visible only from the heap.
Related technique: native core-dump lock introspection¶
Same spirit, different substrate. Fly.io's 2025-05-28
investigation
(sources/2025-05-28-flyio-parking-lot-ffffffffffffffff)
read the 64-bit parking_lot lock word from a Rust
core dump using gdb, identifying a
concepts/bitwise-double-free corruption pattern. Same
pattern — "when the thread dump is silent, the heap / core
dump isn't".
Caveats¶
- Heap dumps are large — hundreds of MB to GB — and take seconds to capture. Not suitable as a first-touch tool on every ticket.
- Reading AQS state requires familiarity with the source code. Investment is one-time but non-trivial.
- Eclipse MAT (or equivalent) is necessary — raw hprof inspection is impractical.
- Not a substitute for better tooling. The right
long-term fix for Netflix's case is
JDK adding lock metadata back
to the
jcmdoutput. Heap-dump introspection is the fallback, not the workflow. - Heap dumps may not be allowed in some environments (PII in memory, export-control-sensitive data).
Seen in¶
- sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock
— Canonical wiki instance. Netflix used Eclipse MAT +
reverse-engineered AQS against
AbstractQueuedSynchronizer.javasource to identify a VT-pinning-caused starvation deadlock where no thread showed as the lock owner.
Related¶
- concepts/heap-dump-lock-introspection — The technique.
- concepts/jcmd-thread-dump — The primary tool whose insufficiency forces this fallback.
- concepts/virtual-thread-pinning — The Netflix bug class.
- patterns/upstream-the-fix — The long-term fix for the tooling gap (JDK restoring lock metadata).
- sources/2025-05-28-flyio-parking-lot-ffffffffffffffff — Rust / core-dump sibling instance.
- companies/netflix — Java-side canonical instance.