META 2024-06-13

Meta — MLow: Meta's low bitrate audio codec¶

Summary¶

Meta Engineering's 2024-06-13 post announces MLow (Meta Low Bitrate), a new real-time communication (RTC) audio codec shipped across WhatsApp, Messenger, and Instagram calls. The design target was explicit: deliver high-quality audio at very low bitrates on low-end devices, where a significant share of Meta's call volume lives. Rather than adopting ML-based codecs like Meta's own Encodec — which hit great quality at low bitrates but only on expensive handsets — Meta built MLow on classic CELP (Code Excited Linear Prediction) DSP techniques with optimisations in excitation generation, parameter quantization, coding schemes, and split-band encoding. The result: ~2× POLQA MOS over Opus at 6 kbps (1.89 vs 3.9 on the post's WB comparison — MLow is 3.9) at 10% lower computational complexity than Opus, and enough bitrate headroom to pack inband FEC at 14 kbps — below Opus's 19 kbps inband-FEC floor — for packet-loss resilience.

Key takeaways¶

The quality / bitrate / complexity triangle is the frame for codec design. "Good codecs can strike a balance among the trio of quality, bitrate, and complexity by exploiting deep knowledge about the nature of the audio signal as well as by using psychoacoustics." Figure 1 of the post visualises this explicitly. (Source text; see concepts/quality-bitrate-complexity-tradeoff)
New codecs are rare events. "The last widely known, good open-source codec was Opus, released in 2012" — 12 years to the next widely-deployed general-purpose codec. Meta had used Opus for all its RTC needs previously. MLow is the first major Meta-proprietary codec shipped at RTC scale. (Source text)
Poor-network calls are a non-trivial chunk of Meta's call volume. "A significant chunk of calls have poor network connections throughout or for part of a call." A bandwidth estimation module (BWE) drives the codec down as the network degrades; video calls compress the audio budget further. Opus's lowest operating point is 6 kbps in NarrowBand (NB) mode (0–4 kHz), which "does not adequately capture all the sound frequencies produced by human voices." (Source text; see concepts/narrowband-vs-wideband-audio, patterns/bandwidth-adaptive-codec-mode)
ML-based codecs don't reach Meta's install base. Meta's own Encodec (October 2022) achieves "amazingly crisp audio quality at very low bitrates" but "only the very high-end (expensive) mobile handsets are able to run these codecs reliably, while users running on lower-end devices continue to experience audio quality issues in low-bitrate conditions. So the net impact of these newer computationally expensive codecs is actually limited to a small portion of users." Canonical statement that ML codecs fail the low-end-device inclusion criterion; see concepts/low-end-device-inclusion, patterns/classic-dsp-over-ml-for-compute-constrained. (Source text)
Concrete install-base numbers constrain design. "More than 20 percent of our calls are made on ARMv7 devices, and 10's of millions of daily calls on WhatsApp are on 10-year-old-plus devices." These numbers are load-bearing on the decision to rule out ML codecs. (Source text)
Quality and complexity wins, side-by-side. "MLow achieves two-times-better quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Even more importantly, we are able to achieve this great quality while keeping MLow's computational complexity 10 percent lower than that of Opus." Note the POLQA MOS units: higher is better on a 1–5 scale (3.9 is MLow, 1.89 is Opus). (Source text; see concepts/polqa-mos-metric)
Lower-bitrate ceiling unlocks more aggressive FEC packing. "Being able to encode high-quality audio at lower bitrates also unlocks more effective Forward Error Correction (FEC) strategies. Compared with Opus, with MLow we can afford to pack FEC at much lower bitrates, which significantly helps to improve the audio quality in packet loss scenarios." At 14 kbps with 30% receiver-side packet loss, Opus cannot encode inband FEC at all — "It needs a minimum of 19 kbps to encode any inband FEC at 10 percent packet loss." MLow can. This is the aggressive-FEC-at-low-bitrate pattern: quality headroom gets spent on redundancy, not fidelity. (Source text; see concepts/forward-error-correction-audio, patterns/aggressive-fec-at-low-bitrate)
Architecture is CELP + split-band + range encoding. "MLow builds on the concepts of a classic CELP (Code Excited Linear Prediction) codec with advancements around excitation generation, parameter quantization, and coding schemes." The encoder splits the input signal into two low and high-frequency bands, encodes each band separately while sharing information, then passes the output through a range encoder for final compression; the decoder runs the inverse. Split-band + cheap high-band encoding lets MLow deliver SuperWideBand (32 kHz sampling) at much lower bitrate than Opus. (Source text; see concepts/split-band-audio-coding)
Fully shipped at RTC scale. "We have already fully launched MLow to all Instagram and Messenger calls and are actively rolling it out on WhatsApp", with "incredible improvement in user engagement driven by better audio quality." This is a production codec, not a research prototype. (Source text)
End-to-end encryption preserved, heavy packet-loss work ongoing. "MLow has greatly enhanced audio quality on low-end devices while still ensuring calls are end-to-end encrypted." Future work: "improving the audio recovery in heavy packet loss networks by pumping out more redundant audio, which MLow allows us to do efficiently" — i.e. doubling down on the FEC headroom from takeaway 7. (Source text)

Systems extracted¶

systems/mlow-codec — Meta's Meta-Low-Bitrate RTC audio codec. CELP-based + split-band + range-encoded. 2× Opus POLQA MOS at 6 kbps WB; 10% lower complexity; SuperWideBand @ 32 kHz at low bitrate. Shipped on Instagram + Messenger + (rolling) WhatsApp calls.
systems/opus-codec — the 2012 open-source general-purpose codec Meta used for all RTC before MLow. Benchmark point for MLow. NarrowBand mode below 6 kbps; 19 kbps floor for inband FEC at 10% loss.
systems/meta-encodec — Meta's AI/ML-based audio compression codec (October 2022). High quality at very low bitrates; too compute-expensive for low-end handsets. Referenced as the canonical ML-codec contrast that motivated MLow's DSP direction.

Concepts extracted¶

New:

concepts/audio-codec — the general primitive: compress raw audio (e.g. 768 kbps PCM mono @ 48 kHz / 16-bit) down to 25–30 kbps (modern codecs) through DSP + psychoacoustic modelling.
concepts/psychoacoustic-compression — exploit the human auditory model (masking, perceptual thresholds, voice-specific structure) to discard information the listener won't perceive.
concepts/quality-bitrate-complexity-tradeoff — the three-axis constraint triangle that governs codec design.
concepts/forward-error-correction-audio — FEC (inband / out-of-band redundancy) in RTC audio streams; bitrate-gated.
concepts/narrowband-vs-wideband-audio — NB (0–4 kHz) / WB (0–8 kHz) / SuperWideBand (0–16 kHz, 32 kHz sampling) / FullBand distinction; NB fails on human voice at 6 kbps.
concepts/split-band-audio-coding — split the input spectrum into bands, encode each separately with information shared; enables cheap high-band encoding.
concepts/polqa-mos-metric — Perceptual Objective Listening Quality Analysis Mean Opinion Score; 1–5 scale objective voice-quality measure used to benchmark codecs.
concepts/low-end-device-inclusion — the product constraint that codec/ML/rendering choices must serve the low-end device population (ARMv7, 10+ year-old handsets) — not just flagships.

Patterns extracted¶

patterns/classic-dsp-over-ml-for-compute-constrained — when ML-based alternatives hit the compute budget on target devices, stay on classic DSP and push its parameter/coding design further. Meta's MLow-over-Encodec decision is the canonical example.
patterns/aggressive-fec-at-low-bitrate — spend bitrate headroom won by a better codec on redundancy (FEC) rather than on fidelity, to improve perceived quality under packet loss.
patterns/bandwidth-adaptive-codec-mode — a bandwidth-estimation module (BWE) drives codec operating point as network conditions change; a lower operating-point floor is a first-class codec feature.

Operational numbers¶

Raw audio: 768 kbps (mono, 48 kHz, 16-bit depth)
Modern codec target: 25–30 kbps (reduction factor ~25–30×)
Opus floor: 6 kbps NB
MLow @ 6 kbps WB: POLQA MOS 3.9 (vs Opus 1.89 @ 6 kbps NB)
MLow compute: ~10% lower than Opus
Opus inband FEC: 19 kbps minimum @ 10% packet loss
MLow inband FEC: 14 kbps feasible @ 30% packet loss (sample-level evidence)
Low-end-device install base: >20% ARMv7, 10s of millions of daily WhatsApp calls on 10+ year-old devices
Development timeline: late 2021 → mid 2024 (~2.5 years)

Caveats¶

The post does not disclose MLow's internal algorithmic detail — no bitstream syntax, no excitation-generation method, no range-coder parameters. Architecture is described at a block-diagram level only.
POLQA comparison is Opus WB vs MLow WB at 6 kbps in the quoted 1.89 vs 3.9 line (both WB), and Opus NB 6 kbps vs MLow WB 6 kbps in the audio samples. The "two-times better quality" claim is a POLQA MOS ratio; POLQA is non-linear in perceived quality, so "2×" is a number-on-a-scale, not a doubling of perceived-quality units.
The FEC audio sample number (30% loss) is not normalised against Opus-with-FEC at 19+ kbps — Opus simply cannot do inband FEC at 14 kbps. The apples-to-apples comparison is implicit.
Full WhatsApp rollout is described as "actively rolling out" at publication — not complete.
No disclosure on whether MLow is or will become open source. The post does not say.

Source¶

Original: https://engineering.fb.com/2024/06/13/web/mlow-metas-low-bitrate-audio-codec/
Raw markdown: raw/meta/2024-06-13-mlow-metas-low-bitrate-audio-codec-2d2b2f80.md

companies/meta — second Meta Engineering source on the wiki (after Meta's 2024-06-12 LLM-training-at-scale post). Different problem domain (RTC audio vs. GenAI training) but same Meta-scale / low-end-device-inclusion commitment.
concepts/quantization — shares the "compute-constrained device forces classic/simpler technique over ML" spirit with Instacart's decision not to ship FP8 quantization despite its speed win, and with Cloudflare's Unweight (lossless) over lossy alternatives.