Skip to content

CONCEPT Cited by 1 source

Lock-state self-synchronizing update

A self-synchronizing atomic update is the optimisation of clearing specific bits of a packed state word not by load-modify-CAS but by atomically adding a specific delta such that — if the word is in the expected state — the add zeroes the target bits in one instruction.

The trick

Given a word W that you expect to have bits S set, you want to clear the bits M ⊆ S. The straightforward implementation:

loop {
    let old = W.load();
    let new = old & ~M;
    if W.compare_and_swap(old, new).is_ok() { break; }
}

— needs a CAS retry loop. The self-synchronizing version is:

let state = W.fetch_add(prev_value.wrapping_sub(M), Ordering::Relaxed);

which is equivalent to W & ~M if W == prev_value. The fetch_add is a single atomic instruction on most architectures; no retry loop; no CAS failure branch.

Why it works

If prev_value has exactly M set (and nothing else), prev_value.wrapping_sub(M) = 0 and fetch_add(0) is a no-op → wrong. If prev_value has M set (and possibly other bits): prev_value.wrapping_sub(M) is the two's-complement inverse of M minus the other bits; adding it to prev_value clears M and leaves the rest. Because two's-complement arithmetic wraps, the same add clears M for any value of W that equals prev_value.

The invariant is that W is exactly prev_value at the moment of the add. If it's anything else, you've added a very large value to the word.

Why it's used

Single-instruction atomics (e.g. LOCK XADD on x86) are substantially faster than CAS-retry loops under contention. parking_lot's RwLock uses this trick extensively in its wake-up paths.

Why it's dangerous

From Fly.io's 2025-05-28 post:

"This pattern is self-synchronizing, but it relies on an invariant: you'd better be right about the original state of the word you're altering. Because if you're wrong, you're adding a very large value to an uncontrolled value." (Source: sources/2025-05-28-flyio-parking-lot-ffffffffffffffff)

The bug class it enables is bitwise double-free — two code paths each think they're the one clearing a bit; the first's clear succeeds; the second's add doesn't cancel because the state has already advanced; the word overflows.

When to use

  • When profiling shows a CAS-retry loop is the contention hot spot and the number of distinct state transitions is small enough to enumerate.
  • When every code path that clears a given bit provably agrees on which transition it's implementing (validated by formal methods, code review, or rigorous test coverage of the happy paths and race windows).

When not to

Any time the "who clears this bit" answer is distributed across multiple wake-up paths that can execute in either order — e.g. parking_lot's pre-PR-#466 try_write_for timeout path + reader-release-unpark path both claiming to clear WRITER_PARKED.

Seen in

Last updated · 200 distilled / 1,178 read