CONCEPT Cited by 1 source
Deep Reinforcement Learning congestion control¶
Definition¶
Deep Reinforcement Learning (DRL) congestion control models the sender's pacing decision as a policy learned from experience, rather than hand-tuning heuristics like CUBIC or BBR. A reinforcement- learning agent observes network state (throughput, delay, loss) and adjusts the congestion window or sending rate to maximise a reward signal — typically throughput minus penalty for delay and packet loss.
DRL-for-CC is an active research axis on top of user-space QUIC. Zalando's 2024-06 post flags it as "the main concept exploited in the research dedicated for protocol improvements in 5G networks" (Source: sources/2024-06-17-zalando-next-level-customer-experience-with-http3-traffic-engineering).
Canonical framing on the wiki¶
Zalando names four DRL CC algorithms from the research literature:
- Aurora — one of the earliest DRL-for-CC systems.
- Eagle
- Orca
- PQB
The wiki does not model each algorithm individually — the canonical framing is that DRL-for-CC is a research axis, not a specific production-ready algorithm. Zalando's post cites lab results showing "higher throughput and round-trip performance under various network settings to compare with competing solutions (e.g. BRR or Remy)." The implementation- to-production transit is still open at the time of the 2024-06 writeup.
Why it's economic now¶
DRL CC requires the flexibility to evaluate new algorithms in production, which kernel-TCP-era CC did not permit. QUIC's user-space CC mutability is the precondition. An A/B-test-per-algorithm deployment pattern works in user space, not in kernel.
The 5G driving case¶
Zalando frames DRL CC as particularly promising for the RAN-bottlenecked 5G environment:
- Heuristic CCs (NewReno, CUBIC) misinterpret RF loss as congestion.
- BBR is the best heuristic option for 5G but still has limits.
- DRL CC can learn the 5G-specific loss / delay distribution directly from experience.
The research claim: a DRL CC trained on 5G-RAN-shaped environments outperforms BBR on that workload because the learned policy is tuned to the specific noise / blockage / handover profile.
Open problems¶
- Generalisation across networks. A DRL policy trained on one network type (e.g. 5G mid-band) may underperform on another (e.g. wired datacentre). Multi-task / online- learning approaches are active research.
- Safety / fairness with existing CC. A DRL CC must coexist with TCP-CUBIC flows in the internet today. Unfairness (DRL-agent hogging bandwidth) is an adoption blocker — similar to BBR v1's known CUBIC-coexistence issues.
- Explainability / debuggability. A heuristic CC's behaviour is auditable by reading the algorithm; a DRL CC's is not. For a CDN running on thousands of POPs, incident-triage capability is load-bearing.
- Reward shaping. The reward function must balance throughput, delay, and loss — different services want different trade-offs (e.g. video prefers low delay; bulk transfer prefers high throughput).
Wiki framing¶
DRL-for-CC is the research axis that QUIC's user-space architecture makes economic, the 5G-RAN bottleneck makes valuable, and heuristic CC's limits make necessary. Zalando presents it as a forward-looking research direction rather than a production system; this wiki entry follows that framing.
Seen in¶
- sources/2024-06-17-zalando-next-level-customer-experience-with-http3-traffic-engineering — canonical wiki instance. Zalando positions DRL CC as the research direction expected to drive the next wave of HTTP/3 protocol improvements for 5G.
Related¶
- concepts/user-space-congestion-control — the precondition.
- concepts/bbr-congestion-control — the current state-of- the-art heuristic that DRL CC aims to improve on.
- concepts/cubic-congestion-control — the dominant heuristic in production today.
- concepts/radio-access-network-bottleneck — the driving workload.
- concepts/quic-transport — the protocol substrate.