Adopting AV1 for Real-Time Communication (RTC) at Scale¶

Summary¶

Meta describes its multi-year effort to deploy the AV1 video codec for real-time communication (RTC) across Messenger and WhatsApp, covering the full production stack: codec encoder/decoder selection for mobile power efficiency, ML-based device eligibility classification, runtime codec complexity adaptation (preset tuning + latency-aware codec switching + asymmetric send/receive codec), accurate VBV-based rate control preventing overshoot/undershoot, and error-resilience strategies (temporal layers + Long-Term Reference frames) for loss recovery without keyframe floods. AV1 is now enabled on the majority of mobile devices in Meta RTC applications.

Key Takeaways¶

20%+ bitrate reduction with AV1 vs H.264/AVC under product settings on low-end and mid-range devices — the fundamental motivation. At ≤100 kbps (common in emerging markets), AV1 remains visually clear while H.264 is noticeably blurry (Source: raw article intro).
A low-complexity AV1 encoder preset achieves power parity with H.264/AVC — enabling AV1 on mid-range and low-end phones. An off-the-shelf open-source AV1 encoder drew 14% more power on a Pixel 8; Meta's internal low-complexity encoder eliminates this gap entirely (Source: "Encoder and Decoder Selection" section).
dav1d selected as AV1 decoder after A/B testing among multiple open-source decoders — chosen for superior power efficiency and reliability, with measurable talk-time extension on mobile (Source: "Decoder Selection" section).
Binary size is a first-order deployment constraint at billion-user scale — 600 kB compressed (AV1 encoder + decoder) could consume an entire year's binary size budget for a large app; Meta pursued dynamic-download (unreliable), QM-table optimization (10% of encoder binary → halved), shared codec libraries, and platform codec reuse (Source: "Binary Size" section).
ML-based device eligibility framework replaces heuristic rules (memory/year/OS version proved insufficient). Collects low-level real-world performance metrics via logging pipeline → outputs an rtc_score per device → determines AV1 capability. Iterative refinement (V1.1 → V2) with two-tier approach differentiating high-end vs low-end encoding capability (Source: "AV1 Device Eligibility" section).
Three-layer codec complexity adaptation handles the reality that even 2023 smartphones throttle CPU during calls: (a) adaptive encoder preset adjustment monitoring encoding latency; (b) local encoding-latency-aware codec switch to H.264 if AV1 preset lowering is insufficient; (c) peer decoding-latency-aware codec switch via continuous feedback. Also considers battery level (Source: "Codec Complexity Adaptation" section).
Asymmetric codec design — mid-range devices that cannot encode AV1 in real-time can decode it, so they send H.264 but receive AV1 from high-end peers. Significantly increases AV1 coverage across the fleet (Source: "Asymmetric Codec Design" section).
VBV (Video Buffering Verifier) delay as rate-control accuracy metric — target <200 ms. Overshoot causes congestion + freeze; undershoot misleads bandwidth estimation and slows ramp-up. The encoder tracks VBV buffer status frame-by-frame, strictly limits keyframe bitrate, and compensates subsequent frames (Source: "Accurate Rate Control" section).
Reference Picture Resampling (RPR) — AV1 feature allowing resolution changes without generating a keyframe, significantly reducing bitrate spikes and video freeze during dynamic resolution adaptation (Source: "Rate Control Optimization" section).
Temporal Layers (TL) for error resilience — two-layer structure where base layer (TL0) maintains continuity without depending on enhancement layer. FEC protects base-layer only. TL enabled adaptively — turned on when loss rises, off when network recovers — because TL reduces compression efficiency under lossless conditions (Source: "Temporal Layer" section).
Long-Term Reference (LTR) frames with explicit RTP header extension indicators + frame_id ACK feedback. Two recovery paths: reactive (RPSI from receiver on freeze) and proactive (sender emits periodic LTRPs when elevated loss detected). LTR frames are combined with periodic higher-quality frames to mitigate temporal-correlation decay. The encoder maintains a bounded reference buffer of size 4 (Source: "Long-Term Reference" section).
Future work: group calls — decoding multiple AV1 streams simultaneously is harder than 1:1; hardware AV1 support across all device tiers is needed for quality improvement (Source: "Meta's Ongoing Journey With AV1" section).

Operational Numbers¶

Metric	Value
AV1 vs H.264 bitrate reduction	≥20% (offline tests, low/mid-range devices)
Open-source AV1 encoder power increase (Pixel 8)	14% vs H.264
Target VBV delay for RTC	<200 ms
RTC video bitrate range (emerging markets)	10–400 kbps
Challenging quality threshold	<100 kbps
AV1 binary size addition (libAOM example)	1.7 MB uncompressed / 600 kB compressed
QM tool share of encoder library size	~10%
LTR reference buffer size	4
Acceptable end-to-end video latency	<300 ms

Systems & Concepts Extracted¶

Systems¶

systems/av1-codec — the codec being deployed
systems/dav1d — open-source AV1 decoder selected for Meta RTC
systems/messenger — Meta RTC application surface
systems/whatsapp — Meta RTC application surface
systems/meta-rtc-platform — Meta's real-time communication infrastructure (new)

Concepts¶

concepts/rtc-codec-rate-control — VBV-based rate control for real-time video (new)
concepts/vbv-delay — leaky-bucket metric for CBR accuracy (new)
concepts/codec-complexity-adaptation — runtime adjustment of encoder preset/codec based on device capability (new)
concepts/device-eligibility-ml — ML-based device capability classification (new)
concepts/asymmetric-codec-design — different send/receive codecs exploiting encode/decode asymmetry (new)
concepts/temporal-layer-error-resilience — temporal scalability for loss tolerance (new)
concepts/long-term-reference-frame — pinned reference frames for fast loss recovery (new)
concepts/reference-picture-resampling — resolution change without keyframe (new)
concepts/binary-size-bloat — existing page, extended with AV1 mobile datum

Patterns¶

patterns/ml-based-device-eligibility — use production telemetry + ML to classify device codec capability (new)
patterns/adaptive-encoder-preset — monitor encoding latency → adjust complexity at runtime (new)
patterns/latency-aware-codec-switching — switch codec based on encoding/decoding latency feedback (new)
patterns/asymmetric-send-receive-codec — send lower-complexity codec, receive higher-complexity codec on weaker devices (new)
patterns/adaptive-temporal-layer-activation — enable/disable TL based on network loss signal (new)
patterns/ltr-proactive-and-reactive-recovery — dual-path LTR recovery: RPSI on freeze + periodic LTRP on elevated loss (new)

Caveats¶

No specific numbers on AV1 fleet coverage percentage, call-quality improvement metrics, or A/B test results disclosed.
ML model architecture (V1.1 / V2) not detailed beyond "uses low-level performance metrics."
Exact encoder identity not disclosed (referred to as "internal low-complexity encoder").
No timeline for group call AV1 deployment.
Power consumption parity claim is for the internal encoder only — not a general AV1 property.

Source¶

sources/2024-06-13-meta-mlow-metas-low-bitrate-audio-codec — Meta's RTC audio codec (MLow); sibling on the RTC media axis
sources/2026-04-09-meta-escaping-the-fork-webrtc-modernization — Meta's WebRTC modernization; shared RTC infrastructure substrate
sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale — Meta's video encoding infrastructure (VOD axis)
systems/av1-codec — the codec's wiki page
systems/dav1d — the decoder selected
companies/meta