CONCEPT Cited by 1 source
Minimum-cwnd death spiral¶
Definition¶
A minimum-cwnd death spiral is a self-perpetuating failure
mode in a loss-based congestion controller where, after a severe
congestion event has collapsed cwnd
to its minimum (typically 2 × MSS), the controller fails to grow
cwnd back up even when loss has completely stopped — because
its own idle-detection logic misreads the transient
bytes_in_flight drain between
ACKs as application idleness and advances the recovery boundary
into the future on every send.
The canonical wiki instance is CUBIC in quiche (pre-2026-05-12 fix), where the 2020 port of the Linux-kernel 2017 idle-period adjustment trapped the connection at the two-packet floor indefinitely (Source: sources/2026-05-12-cloudflare-when-idle-isnt-idle-how-a-linux-kernel-optimization-became-a-quic-bug).
The five-step loop¶
Given a connection at minimum cwnd (two packets), the death
spiral cycles once per RTT:
- Send and ACK. The sender transmits the entire two-packet
window. After one RTT, both packets are ACKed;
bytes_in_flightdrops to zero. - False idle detection. When the next burst is sent,
on_packet_sent()seesbytes_in_flight == 0and assumes the connection was idle. But it wasn't — it was congestion-limited. The application had data ready to send; the pipe drained becausecwndallowed only two packets. - Inflated delta. The idle-delta is computed as
now − last_sent_time. At minimumcwnd,last_sent_timeis the timestamp of the start of the previous RTT cycle, so the delta is ~RTT (e.g. ~14 ms on a 10 ms-RTT connection, plus jitter). The actual idle duration — the gap between the last ACK arriving and the next packet being sent — is effectively zero. - Perceived recovery. Because the recovery boundary has
been advanced by ~RTT into the future,
in_congestion_recovery()returnstruefor every incoming ACK. Processing the ACK exits recovery, sets the recovery- start time to the ACK time (which is later thanlast_sent_time), and on the next send pushes the boundary further into the future. - Stagnation. CUBIC skips
cwndgrowth for any packet perceived to be in recovery.cwndstays pinned at two packets. The pipe drains again on the next ACK. Goto 1.
The escape condition¶
The loop is not strictly infinite — it breaks when the
<= boundary in in_congestion_recovery() slips behind the
next packet's send time, which happens when the accumulated
scheduler jitter + ACK-processing variance exceeds whatever
rounding slack exists. In Cloudflare's canonical 2026-05-12
measurement, the loop ran for 999 state transitions across
~6.7 seconds — well past the test's 10-second timeout — and
in ~60% of 100-run batches failed to escape at all.
Why loss-based CCAs specifically¶
The death spiral is a CUBIC-specific failure mode in the
2026-05-12 post. The control experiment — same test, same
parameters, Reno swapped in for CUBIC —
passed 100% of runs. Reno recovers cleanly because it has no
epoch state variable whose arithmetic can drift past wall-
clock time on every ACK cycle; CUBIC's growth-curve parameter
delta_t = now − epoch_start is the specific structural surface
that the death spiral exploits.
The canonical escape hatch is the fix itself — measure the
idle gap from max(last_ack_time, last_sent_time) rather than
from last_sent_time alone — see
patterns/measure-idle-from-last-ack-not-last-send. With this
fix, the delta at minimum cwnd is ~0 ms (the true
processing-gap time between the last ACK and the next send),
not ~RTT, and the recovery boundary stops advancing into the
future.
Diagnostic fingerprint¶
- Oscillation period matches RTT. The 2026-05-12 instance
showed 14 ms between
congestion_avoidance ↔ recoverytransitions on a 10 ms-RTT connection. When the oscillation period matches the ACK clock, the trigger is happening once per round trip — which is diagnostic of an ACK-clocked,bytes_in_flight == 0-triggered bug class. cwndpinned at the two-packet floor. 2,700 bytes in the 2026-05-12 instance. Ifcwndis stuck at minimum after loss has stopped, a loss-based-CCA idle-detection bug is a prime suspect.- Throughput flat, no packet-loss events. The dashboard
signature is a connection that's not losing packets (so not
visibly congested) but isn't growing either.
cwndgraphs + qlog state-transition visualisations are how Cloudflare found it.
Preconditions¶
Three conditions must hold simultaneously:
- Real loss event that set
congestion_recovery_start_time. Before any loss, the recovery boundary is unset and the buggy branch has no value to advance. - Post-slow-start (congestion avoidance phase). During slow-start, CUBIC uses Reno's AIMD-style ACK-based growth, bypassing the cubic curve and its epoch state.
cwndcollapsed to the two-packet floor. Only at minimumcwnddoesbytes_in_flightreliably hit zero on every ACK cycle.
Absent any of the three, the death spiral doesn't trigger — which is why throughput dashboards didn't see it, and why static code review didn't catch it.
Seen in¶
- sources/2026-05-12-cloudflare-when-idle-isnt-idle-how-a-linux-kernel-optimization-became-a-quic-bug
— canonical wiki instance. Cloudflare's quiche CUBIC
integration test ran a 10 MB HTTP/3 download under 30% loss
during the first 2 s of a 10 ms-RTT connection; ~60% of runs
failed the 10-second timeout with
cwndlocked at 2,700 bytes and 999 state transitions in ~6.7 s. The fix (three lines of logic addinglast_ack_timeas the secondary idle-delta anchor) restored 100% pass rate.
Related¶
- concepts/congestion-window
- concepts/bytes-in-flight
- concepts/cubic-epoch
- concepts/false-idle-detection
- concepts/ack-clock
- concepts/qlog-quic-instrumentation
- systems/cubic-congestion-control
- systems/quiche
- systems/tcp-reno
- patterns/measure-idle-from-last-ack-not-last-send
- patterns/adversarial-corner-case-test-for-recovery
- companies/cloudflare