CONCEPT Cited by 1 source
Audio codec¶
Definition¶
An audio codec (coder-decoder) compresses raw audio for transmission or storage and reconstructs it at the receiver. For real-time communication (RTC), the codec is the critical component that makes voice / video call audio deliverable over the public internet. Compression ratios are high: a raw monaural audio stream at 48 kHz / 16-bit is 768 kbps; modern codecs compress that to 25–30 kbps — a factor of ~25–30× — by exploiting psychoacoustics and voice-signal structure (Source: sources/2024-06-13-meta-mlow-metas-low-bitrate-audio-codec).
What a codec trades off¶
Every codec design resolves three axes simultaneously — see concepts/quality-bitrate-complexity-tradeoff:
- Quality — how faithfully the decoded audio matches the source (objectively measured via POLQA MOS and its predecessors, subjectively via listening tests).
- Bitrate — compressed bits per second. Lower bitrate means more compression, fewer bytes on the wire.
- Complexity — encoder + decoder CPU cost. Load-bearing for RTC on low-end handsets — see concepts/low-end-device-inclusion.
Why new codecs are rare¶
"Building a good codec is quite challenging, and that is why we don't see new codecs emerging very often. The last widely known, good open-source codec was Opus, released in 2012." 12 years between widely-deployed general-purpose codecs (Opus → MLow, 2024) is the frame the Meta MLow post sets. Canonically low-churn category of systems.
Techniques named in the MLow source¶
- CELP (Code Excited Linear Prediction). Classic voice codec family. Underlying both Opus's SILK mode and MLow.
- Split-band coding. See concepts/split-band-audio-coding.
- Range encoding. Entropy coder used at the tail of MLow's pipeline.
- ML-based neural codecs (Meta's Encodec, e.g.). Learned encoder / decoder networks; high quality at very low bitrate; high compute cost.
Seen in¶
- sources/2024-06-13-meta-mlow-metas-low-bitrate-audio-codec — the canonical RTC-codec reference on this wiki.