Skip to content

CONCEPT Cited by 1 source

Heap-dump lock introspection

Heap-dump lock introspection is the diagnostic technique of reading the on-heap state of a lock object to determine its current owner and waiter queue — a fallback for situations where the thread dump doesn't (or can't) report lock metadata.

When to use it

Use it when:

  1. The thread dump is exhausted as an evidence source — it shows N threads waiting on some lock, but no thread shows as the owner.
  2. The tooling has known limitations that drop lock metadata (Java 21's jcmd Thread.dump_to_file does this — no - locked lines, no Locked ownable synchronizers lines).
  3. Your working hypothesis is wrong and the evidence contradicts it — you need to inspect the lock object itself, not reason about who should own it.

How it works (Java specifically)

Java's ReentrantLock / ReentrantReadWriteLock / Semaphore / CountDownLatch / most java.util.concurrent locks all delegate state to AbstractQueuedSynchronizer (AQS). The AQS object is a regular Java object on the heap, with fields:

  • state — int/long that encodes ownership. For ReentrantLock: state > 0 means someone holds the lock with count state; 0 means free.
  • exclusiveOwnerThread — reference to the Thread that currently holds the lock, or null.
  • head / tail — FIFO queue of waiters.
  • firstWaiter / lastWaiter (on ConditionObject) — queue of threads parked on a Condition.

All of these are directly readable from a heap dump using Eclipse MAT (or VisualVM, YourKit, jol, etc.).

The technique

  1. Take a heap dump: jcmd <pid> GC.heap_dump /tmp/heap.hprof (or jmap). Pair with the jcmd thread dump so you can cross-reference.
  2. Open the heap in Eclipse MAT.
  3. Identify the lock object. Easiest path: find a known waiter thread, walk its stack-local references back to the ReentrantLock instance. (Netflix did exactly this via the AsyncReporter thread's stack.)
  4. Inspect the AQS state fields (state, exclusiveOwnerThread, waiter queue).
  5. Cross-reference thread IDs in the waiter queue with the thread-dump stack traces to reconstruct the full picture.

The Netflix 2024-07-29 application

"Finding the lock in the heap dump was relatively straightforward. Using the excellent Eclipse MAT tool, we examined the objects on the stack of the AsyncReporter non-virtual thread to identify the lock object. Reasoning about the current state of the lock was perhaps the trickiest part of our investigation. Most of the relevant code can be found in the AbstractQueuedSynchronizer.java. While we don't claim to fully understand the inner workings of it, we reverse-engineered enough of it to match against what we see in the heap dump." (Source: sources/2024-07-29-netflix-java-21-virtual-threads-dude-wheres-my-lock)

Netflix confirmed from the heap: - exclusiveOwnerThread == null — no current owner. - The waiter queue contained 6 threads (4 pinned VTs + 1 non-pinned VT + 1 platform AsyncReporter flusher). - The Condition's internal queue showed the flusher had released via awaitNanos and been requeued for the lock.

This evidence together forced the conclusion: the flusher is the recent owner, released via awaitNanos, and the FIFO write-preference queue in AQS placed it behind the pinned VTs — making the AQS queue the de-facto structural lock on forward progress.

Similar in spirit for Rust / C++ / Go: take a core dump, inspect the lock object's bytes in gdb / dlv / LLDB. See sources/2025-05-28-flyio-parking-lot-ffffffffffffffff where Fly.io used exactly this on parking_lot's 64-bit lock word to identify a bitwise double-free bug.

Seen in

Last updated · 319 distilled / 1,201 read