CONCEPT Cited by 1 source
if-let lock-scope bug¶
In Rust, an if let expression that acquires a lock in its
scrutinee holds that lock across both the if arm and
the else arm — a surprise for programmers who mentally model
if let as a conditional (lock held only inside the if).
The code shape¶
if let (Some(Load::Local(load))) = (&self.load.read().get(...)) {
// do a bunch of stuff with `load`
} else {
self.init_for(...); // ← lock is STILL held here
}
Here self.load.read() acquires a read lock on a RwLock; the
.get(...) returns an Option which if let pattern-matches.
The lock guard lives as long as the if let expression's
temporary, which per Rust's drop rules is until the end of the
entire if let expression — including the else arm.
Why the bug is easy to introduce¶
"match can be cumbersome, and so there are shorthands. One
of them is if let, which is syntax that makes a pattern match
read like a classic if statement." "The bug is subtle: in
that code, the lock self.load.read().get() takes is held not
just for the duration of the 'if' arm, but also for the 'else'
arm — you can think of if let expressions as being rewritten
to the equivalent match expression, where that lifespan is
much clearer."
(Source: sources/2025-05-28-flyio-parking-lot-ffffffffffffffff)
The mental model "this reads like an if statement" and the
semantic "this is a match expression with one scope" diverge
on exactly the else arm. If the else arm takes a write
lock on the same RwLock (or calls something that does
transitively), it deadlocks.
Fly.io's 2024 production-scale consequence¶
On the 2024-era fly-proxy Anycast routing layer, an if let
over self.load.read().get() had an else arm that called
self.init_for(...), which re-entered to take a write lock on
the same lock. Under a
Corrosion update that flowed
fleet-wide in milliseconds for an app nobody used, every
fly-proxy instance hit the else arm in rapid succession and
deadlocked itself on the now-held read lock. Global anycast
routing deadlock.
"It occurred on a code path in
fly-proxythat was triggered by a Corrosion update propagating from host to host across our fleet in millisecond intervals of time, converging quickly, like a good little routing protocol, on a global consensus that our entire Anycast routing layer should be deadlocked. Fun times." (Source: sources/2025-05-28-flyio-parking-lot-ffffffffffffffff)
This is the canonical wiki instance of how a local Rust syntax trap can compose with a global state-propagation protocol to produce a fleet-wide outage — and why replacing RAII lock guards with explicit closures makes lock lifespans visible in code.
Prevention¶
- Prefer explicit closures for critical sections — the lock-guard lifetime is exactly the closure body.
- Never call code that may re-acquire the same lock from the
elsearm of anif letwhose scrutinee took the lock. - Hoist the lock-guarded read into a local variable before
the branch — then the scope is explicit and the guard can be
dropped before the
elsearm executes. - Clippy lint
clippy::if_let_mutexcatches theMutexvariant but not everyRwLockre-entrance shape; it's necessary but not sufficient.
Seen in¶
- sources/2025-05-28-flyio-parking-lot-ffffffffffffffff —
Canonical instance: the 2024
fly-proxyglobal Anycast deadlock introduced the bug; the 2025 follow-up post audited everyif letover a lock out of the codebase before debugging the second (unrelated) lockup wave.
Related¶
- systems/fly-proxy — The system where the bug manifested.
- systems/parking-lot-rust — The lock library (bug is language-level, not library-specific).
- patterns/raii-to-explicit-closure-for-lock-visibility — The refactor that makes such bugs visible.
- concepts/deadlock-vs-lock-contention — The class of bug this is (pure deadlock).
- companies/flyio — Fly.io.