Skip to content

CONCEPT Cited by 1 source

Audio codec

Definition

An audio codec (coder-decoder) compresses raw audio for transmission or storage and reconstructs it at the receiver. For real-time communication (RTC), the codec is the critical component that makes voice / video call audio deliverable over the public internet. Compression ratios are high: a raw monaural audio stream at 48 kHz / 16-bit is 768 kbps; modern codecs compress that to 25–30 kbps — a factor of ~25–30× — by exploiting psychoacoustics and voice-signal structure (Source: sources/2024-06-13-meta-mlow-metas-low-bitrate-audio-codec).

What a codec trades off

Every codec design resolves three axes simultaneously — see concepts/quality-bitrate-complexity-tradeoff:

  • Quality — how faithfully the decoded audio matches the source (objectively measured via POLQA MOS and its predecessors, subjectively via listening tests).
  • Bitrate — compressed bits per second. Lower bitrate means more compression, fewer bytes on the wire.
  • Complexity — encoder + decoder CPU cost. Load-bearing for RTC on low-end handsets — see concepts/low-end-device-inclusion.

Why new codecs are rare

"Building a good codec is quite challenging, and that is why we don't see new codecs emerging very often. The last widely known, good open-source codec was Opus, released in 2012." 12 years between widely-deployed general-purpose codecs (Opus → MLow, 2024) is the frame the Meta MLow post sets. Canonically low-churn category of systems.

Techniques named in the MLow source

  • CELP (Code Excited Linear Prediction). Classic voice codec family. Underlying both Opus's SILK mode and MLow.
  • Split-band coding. See concepts/split-band-audio-coding.
  • Range encoding. Entropy coder used at the tail of MLow's pipeline.
  • ML-based neural codecs (Meta's Encodec, e.g.). Learned encoder / decoder networks; high quality at very low bitrate; high compute cost.

Seen in

Last updated · 319 distilled / 1,201 read