CONCEPT Cited by 1 source
Bitwise double-free¶
A bitwise double-free is the race-condition shape where two code paths attempt to atomically clear the same bit of a tightly-packed state word without coordinating — the second clear executes on an already-cleared bit and, because the atomic clear is often implemented as an add of the two's-complement inverse for speed (see concepts/lock-state-self-synchronizing), it adds a large value to the word instead of zeroing a bit. The word overflows, wraps, or saturates into an inconsistent state — the bits double-freed, the word catastrophically corrupted.
Analogy to allocator double-free¶
In a conventional double-free on an allocator, two free(p)
calls on the same pointer corrupt the allocator's free-list
metadata. The corruption is visible only later, when something
tries to use the allocator. Bitwise double-free is the same
shape in a different register: the "allocation" is a specific
bit, the "free-list" is a packed state word, and the
corruption becomes visible when something next tries to consume
or update the word.
The 2025-05-28 Fly.io / parking_lot instance¶
parking_lot's RwLock packs
state into a 64-bit word: 4 signaling bits (PARKED,
WRITER_PARKED, WRITER, UPGRADEABLE) + 60-bit reader
counter. Clearing WRITER | WRITER_PARKED was done by:
let state = self.state.fetch_add(
prev_value.wrapping_sub(WRITER_BIT | WRITER_PARKED_BIT),
Ordering::Relaxed,
);
— assuming prev_value is the word's current state. If
WRITER_PARKED was already cleared by a reader-release-path
that unparked the waiting writer, a subsequent timeout-path in
the same writer thread would also try to clear
WRITER_PARKED (because from its local view of events, it
timed out without being unparked). The subtraction no longer
cancels — the state becomes 0xFFFFFFFFFFFFFFFE. Reader
counter saturated. All 4 signal bits set on the next
transition. Lock word reaches 0xFFFFFFFFFFFFFFFF. No thread
is in the critical section; every thread wants the lock;
neither readers nor writers can make progress.
Thomas Ptacek's framing:
"This pattern is self-synchronizing, but it relies on an invariant: you'd better be right about the original state of the word you're altering. Because if you're wrong, you're adding a very large value to an uncontrolled value." (Source: sources/2025-05-28-flyio-parking-lot-ffffffffffffffff)
Symptom vs cause asymmetry¶
The bitwise double-free produces an artificial deadlock: threads wait for a lock that no thread holds. This is pathologically confusing because:
- Standard deadlock detectors that track waiter-graphs (as
parking_lot::deadlockdoes) find nothing — there's no cycle because there's no owner. - Core dumps look like "everyone waits, no critical section" — literally impossible under the mental model of how locks work.
- Theories that predict slow readers / writer starvation / async-lock confusion all fail to explain the evidence (see concepts/descent-into-madness-debugging).
The discriminating clue, in this case, came from switching to
read_recursive as a desperation probe and seeing
RwLock reader count overflow messages in logs. The reader
counter's saturation was the first direct evidence that the
word itself was corrupted.
Fix shape¶
Fix the coordination between the two clear paths. For
parking_lot PR #466, the writer bit is cleared separately
in the same wakeup queue as the reader so the two clears are
mutually exclusive. The optimisation's invariant now holds.
Seen in¶
- sources/2025-05-28-flyio-parking-lot-ffffffffffffffff —
Canonical wiki instance.
parking_lotRWLock state corruption via race between reader-release-path unparking the writer and timeout-path in the writer; upstream fix PR #466 issue #465.
Related¶
- systems/parking-lot-rust — The library where the bug class was canonicalised.
- concepts/lock-state-self-synchronizing — The optimisation that enables the bug class.
- concepts/descent-into-madness-debugging — How it presents to the debugger.
- patterns/upstream-the-fix — How Fly.io delivered the PR #466 fix.
- companies/flyio — Fly.io.