Skip to content

CONCEPT Cited by 1 source

Minimum-cwnd death spiral

Definition

A minimum-cwnd death spiral is a self-perpetuating failure mode in a loss-based congestion controller where, after a severe congestion event has collapsed cwnd to its minimum (typically 2 × MSS), the controller fails to grow cwnd back up even when loss has completely stopped — because its own idle-detection logic misreads the transient bytes_in_flight drain between ACKs as application idleness and advances the recovery boundary into the future on every send.

The canonical wiki instance is CUBIC in quiche (pre-2026-05-12 fix), where the 2020 port of the Linux-kernel 2017 idle-period adjustment trapped the connection at the two-packet floor indefinitely (Source: sources/2026-05-12-cloudflare-when-idle-isnt-idle-how-a-linux-kernel-optimization-became-a-quic-bug).

The five-step loop

Given a connection at minimum cwnd (two packets), the death spiral cycles once per RTT:

  1. Send and ACK. The sender transmits the entire two-packet window. After one RTT, both packets are ACKed; bytes_in_flight drops to zero.
  2. False idle detection. When the next burst is sent, on_packet_sent() sees bytes_in_flight == 0 and assumes the connection was idle. But it wasn't — it was congestion-limited. The application had data ready to send; the pipe drained because cwnd allowed only two packets.
  3. Inflated delta. The idle-delta is computed as now − last_sent_time. At minimum cwnd, last_sent_time is the timestamp of the start of the previous RTT cycle, so the delta is ~RTT (e.g. ~14 ms on a 10 ms-RTT connection, plus jitter). The actual idle duration — the gap between the last ACK arriving and the next packet being sent — is effectively zero.
  4. Perceived recovery. Because the recovery boundary has been advanced by ~RTT into the future, in_congestion_recovery() returns true for every incoming ACK. Processing the ACK exits recovery, sets the recovery- start time to the ACK time (which is later than last_sent_time), and on the next send pushes the boundary further into the future.
  5. Stagnation. CUBIC skips cwnd growth for any packet perceived to be in recovery. cwnd stays pinned at two packets. The pipe drains again on the next ACK. Goto 1.

The escape condition

The loop is not strictly infinite — it breaks when the <= boundary in in_congestion_recovery() slips behind the next packet's send time, which happens when the accumulated scheduler jitter + ACK-processing variance exceeds whatever rounding slack exists. In Cloudflare's canonical 2026-05-12 measurement, the loop ran for 999 state transitions across ~6.7 seconds — well past the test's 10-second timeout — and in ~60% of 100-run batches failed to escape at all.

Why loss-based CCAs specifically

The death spiral is a CUBIC-specific failure mode in the 2026-05-12 post. The control experiment — same test, same parameters, Reno swapped in for CUBIC — passed 100% of runs. Reno recovers cleanly because it has no epoch state variable whose arithmetic can drift past wall- clock time on every ACK cycle; CUBIC's growth-curve parameter delta_t = now − epoch_start is the specific structural surface that the death spiral exploits.

The canonical escape hatch is the fix itself — measure the idle gap from max(last_ack_time, last_sent_time) rather than from last_sent_time alone — see patterns/measure-idle-from-last-ack-not-last-send. With this fix, the delta at minimum cwnd is ~0 ms (the true processing-gap time between the last ACK and the next send), not ~RTT, and the recovery boundary stops advancing into the future.

Diagnostic fingerprint

  • Oscillation period matches RTT. The 2026-05-12 instance showed 14 ms between congestion_avoidance ↔ recovery transitions on a 10 ms-RTT connection. When the oscillation period matches the ACK clock, the trigger is happening once per round trip — which is diagnostic of an ACK-clocked, bytes_in_flight == 0-triggered bug class.
  • cwnd pinned at the two-packet floor. 2,700 bytes in the 2026-05-12 instance. If cwnd is stuck at minimum after loss has stopped, a loss-based-CCA idle-detection bug is a prime suspect.
  • Throughput flat, no packet-loss events. The dashboard signature is a connection that's not losing packets (so not visibly congested) but isn't growing either. cwnd graphs + qlog state-transition visualisations are how Cloudflare found it.

Preconditions

Three conditions must hold simultaneously:

  1. Real loss event that set congestion_recovery_start_time. Before any loss, the recovery boundary is unset and the buggy branch has no value to advance.
  2. Post-slow-start (congestion avoidance phase). During slow-start, CUBIC uses Reno's AIMD-style ACK-based growth, bypassing the cubic curve and its epoch state.
  3. cwnd collapsed to the two-packet floor. Only at minimum cwnd does bytes_in_flight reliably hit zero on every ACK cycle.

Absent any of the three, the death spiral doesn't trigger — which is why throughput dashboards didn't see it, and why static code review didn't catch it.

Seen in

Last updated · 542 distilled / 1,571 read