CONCEPT Cited by 1 source
Spurious-wakeup busy-loop¶
Definition¶
A spurious-wakeup busy-loop is the pathology where an
async-Rust (or more generally, any poll-driven) state machine
signals readiness to its executor without actually having
anything new to do, causing the executor to re-enter poll in a
tight loop and burn a CPU core at near 100%. The symptom is
"high CPU, low work" — "samples that almost terminate in libc,
but spend next to no time in the kernel doing actual I/O"
(Source: sources/2025-02-26-flyio-taming-a-voracious-rust-proxy).
The two shapes¶
Per Fly.io's 2025-02-26 post, two variants in async Rust:
- Pending that wakes itself. A
Future returns
Pendingand fires its own Waker — telling the executor "not ready, poll me again soon" — before anything has changed. Executor re-polls; same state; same spurious wake. Cycle. - Ready that doesn't progress. An
AsyncReadreturnsReadywithout actually consuming data / advancing its state machine. The caller — faithfully loopingpoll_readuntil it stops beingReadyper the contract — spins on it.
Both collapse to the same pathology: the Future looks alive to the executor but isn't making progress, and nothing external is going to wake it up and reset the loop.
Canonical diagnosis signal¶
Flamegraph profiling turns the
pathology inside out: if the trace is dominated by low-level
runtime or spancost infrastructure (tracing::Subscriber
entering/exiting spans, tokio polling machinery, libc syscalls
that return immediately) with almost nothing in the actual
business-logic leaves, that's the signature. As Fly.io puts it
— entering/exiting a tracing span in Tokio is supposed to be
very fast, so if it dominates the profile, the code being
traced must be doing essentially nothing, which means something
is calling poll a lot and getting nothing back.
The Future's fully-qualified type in the flamegraph then identifies the guilty layer. Fly.io's 2025-02 case:
&mut fp_io::copy::Duplex<&mut fp_io::reusable_reader::ReusableReader<
fp_tcp::peek::PeekableReader<
tokio_rustls::server::TlsStream<…>>>, …>
Own-code wrappers (Duplex, ReusableReader, PeekableReader,
MeteredIo, PermittedTcpStream) audited first; the one
third-party layer — tokio_rustls::server::TlsStream — was
guilty, via concepts/tls-close-notify edge case.
Why this is insidious¶
- The bug is in one layer but the symptom is consumption of a whole CPU. Every wrapper between the bug and the executor is a false suspect.
- CPU-pegging incidents present as "platform degradation" but the platform is fine; one or two tasks are just refusing to yield.
- Routine mitigation (bouncing the process) clears the stuck tasks but not the trigger condition — it comes back as soon as traffic hits the right state again.
- Cheap instrumentation would help but isn't default — patterns/spurious-wakeup-metric is Fly.io's explicit follow-up: "Spurious wakeups should be easy to spot, and triggering a metric when they happen should be cheap, because they're not supposed to happen often."
Seen in¶
- sources/2025-02-26-flyio-taming-a-voracious-rust-proxy —
canonical wiki instance. Two IAD edge hosts pegged on
systems/fly-proxy; a partner load test
(Tigris) was producing TLS connections
that closed with buffered data on the socket,
which exposed a tokio-rustls
TlsStreamWaker bug. Flamegraph → Future type → suspect narrowing → upstream rustls PR #1950.
Related¶
- concepts/async-rust-future / concepts/rust-waker / concepts/asyncread-contract — the primitives this pathology violates.
- concepts/cpu-busy-loop-incident — the operational frame.
- concepts/flamegraph-profiling — the canonical diagnostic move.
- patterns/spurious-wakeup-metric — the cheap instrumentation mitigation Fly.io committed to.
- patterns/flamegraph-to-upstream-fix — full arc from symptom to patch.
- systems/tokio / systems/rustls / systems/tokio-rustls — the Rust-async stack most subject to this pattern.
- companies/flyio