Skip to content

META

Read original ↗

Adopting AV1 for Real-Time Communication (RTC) at Scale

Summary

Meta describes its multi-year effort to deploy the AV1 video codec for real-time communication (RTC) across Messenger and WhatsApp, covering the full production stack: codec encoder/decoder selection for mobile power efficiency, ML-based device eligibility classification, runtime codec complexity adaptation (preset tuning + latency-aware codec switching + asymmetric send/receive codec), accurate VBV-based rate control preventing overshoot/undershoot, and error-resilience strategies (temporal layers + Long-Term Reference frames) for loss recovery without keyframe floods. AV1 is now enabled on the majority of mobile devices in Meta RTC applications.

Key Takeaways

  1. 20%+ bitrate reduction with AV1 vs H.264/AVC under product settings on low-end and mid-range devices — the fundamental motivation. At ≤100 kbps (common in emerging markets), AV1 remains visually clear while H.264 is noticeably blurry (Source: raw article intro).

  2. A low-complexity AV1 encoder preset achieves power parity with H.264/AVC — enabling AV1 on mid-range and low-end phones. An off-the-shelf open-source AV1 encoder drew 14% more power on a Pixel 8; Meta's internal low-complexity encoder eliminates this gap entirely (Source: "Encoder and Decoder Selection" section).

  3. dav1d selected as AV1 decoder after A/B testing among multiple open-source decoders — chosen for superior power efficiency and reliability, with measurable talk-time extension on mobile (Source: "Decoder Selection" section).

  4. Binary size is a first-order deployment constraint at billion-user scale — 600 kB compressed (AV1 encoder + decoder) could consume an entire year's binary size budget for a large app; Meta pursued dynamic-download (unreliable), QM-table optimization (10% of encoder binary → halved), shared codec libraries, and platform codec reuse (Source: "Binary Size" section).

  5. ML-based device eligibility framework replaces heuristic rules (memory/year/OS version proved insufficient). Collects low-level real-world performance metrics via logging pipeline → outputs an rtc_score per device → determines AV1 capability. Iterative refinement (V1.1 → V2) with two-tier approach differentiating high-end vs low-end encoding capability (Source: "AV1 Device Eligibility" section).

  6. Three-layer codec complexity adaptation handles the reality that even 2023 smartphones throttle CPU during calls: (a) adaptive encoder preset adjustment monitoring encoding latency; (b) local encoding-latency-aware codec switch to H.264 if AV1 preset lowering is insufficient; (c) peer decoding-latency-aware codec switch via continuous feedback. Also considers battery level (Source: "Codec Complexity Adaptation" section).

  7. Asymmetric codec design — mid-range devices that cannot encode AV1 in real-time can decode it, so they send H.264 but receive AV1 from high-end peers. Significantly increases AV1 coverage across the fleet (Source: "Asymmetric Codec Design" section).

  8. VBV (Video Buffering Verifier) delay as rate-control accuracy metric — target <200 ms. Overshoot causes congestion + freeze; undershoot misleads bandwidth estimation and slows ramp-up. The encoder tracks VBV buffer status frame-by-frame, strictly limits keyframe bitrate, and compensates subsequent frames (Source: "Accurate Rate Control" section).

  9. Reference Picture Resampling (RPR) — AV1 feature allowing resolution changes without generating a keyframe, significantly reducing bitrate spikes and video freeze during dynamic resolution adaptation (Source: "Rate Control Optimization" section).

  10. Temporal Layers (TL) for error resilience — two-layer structure where base layer (TL0) maintains continuity without depending on enhancement layer. FEC protects base-layer only. TL enabled adaptively — turned on when loss rises, off when network recovers — because TL reduces compression efficiency under lossless conditions (Source: "Temporal Layer" section).

  11. Long-Term Reference (LTR) frames with explicit RTP header extension indicators + frame_id ACK feedback. Two recovery paths: reactive (RPSI from receiver on freeze) and proactive (sender emits periodic LTRPs when elevated loss detected). LTR frames are combined with periodic higher-quality frames to mitigate temporal-correlation decay. The encoder maintains a bounded reference buffer of size 4 (Source: "Long-Term Reference" section).

  12. Future work: group calls — decoding multiple AV1 streams simultaneously is harder than 1:1; hardware AV1 support across all device tiers is needed for quality improvement (Source: "Meta's Ongoing Journey With AV1" section).

Operational Numbers

Metric Value
AV1 vs H.264 bitrate reduction ≥20% (offline tests, low/mid-range devices)
Open-source AV1 encoder power increase (Pixel 8) 14% vs H.264
Target VBV delay for RTC <200 ms
RTC video bitrate range (emerging markets) 10–400 kbps
Challenging quality threshold <100 kbps
AV1 binary size addition (libAOM example) 1.7 MB uncompressed / 600 kB compressed
QM tool share of encoder library size ~10%
LTR reference buffer size 4
Acceptable end-to-end video latency <300 ms

Systems & Concepts Extracted

Systems

Concepts

Patterns

Caveats

  • No specific numbers on AV1 fleet coverage percentage, call-quality improvement metrics, or A/B test results disclosed.
  • ML model architecture (V1.1 / V2) not detailed beyond "uses low-level performance metrics."
  • Exact encoder identity not disclosed (referred to as "internal low-complexity encoder").
  • No timeline for group call AV1 deployment.
  • Power consumption parity claim is for the internal encoder only — not a general AV1 property.

Source

Last updated · 559 distilled / 1,651 read