Skip to content

CONCEPT Cited by 2 sources

Video transcoding

Definition

Video transcoding is the act of decoding a source video and re-encoding it to one or more target encodings, optionally changing resolution, codec, framerate, container format, or perceived quality level. It is the canonical operation of any video-serving infrastructure: user-uploaded content in an arbitrary codec/container/resolution mix cannot be served directly to every client, so the platform must transform it (Source: sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale).

The industry-standard toolchain is FFmpeg. At hyperscale, transcoding is a first-class compute cost: Meta alone runs FFmpeg / ffprobe invocations tens of billions of times per day, with > 1 billion video uploads per day each triggering multiple invocations.

Core operations

A transcoding job typically composes:

  1. Demux — read the container format, separate audio / video / subtitle streams.
  2. Decode — reverse the codec compression to raw frames (YUV for video, PCM for audio).
  3. Filter (optional) — scale, crop, rotate, colour correct, watermark.
  4. Encode — apply the target codec at the target bitrate/quality. This is usually the compute-dominant step.
  5. Mux — wrap the encoded streams into the target container (e.g. fMP4 for DASH segments).

Why multi-output pipelines matter

Any DASH / HLS / adaptive-streaming pipeline needs multiple output encodings from the same source — an encoding "ladder" of resolutions/codecs/bitrates that a player can switch between at runtime (see concepts/adaptive-bitrate-streaming-dash and concepts/multi-lane-encoding-pipeline). Running one FFmpeg process per output re-decodes the source each time; running one FFmpeg process with multiple outputs decodes once and shares frames across encoders (see patterns/deduplicate-decode-across-encoder-lanes).

Hardware acceleration

Both decoding and encoding can be offloaded to dedicated fixed-function hardware. FFmpeg exposes all such devices through a common abstraction — the hardware-accelerated video codec API — so pipeline code is largely the same whether the underlying silicon is NVIDIA NVENC/NVDEC, Intel Quick Sync Video, AMD UVD, or Meta's private MSVP ASIC.

Quality metrics

A transcode is a lossy transformation, so output quality is a first-class concern. Metrics like PSNR, SSIM, and VMAF compare decoded-pre-compression frames against decoded-post-compression frames to score perceived quality loss — see concepts/visual-quality-metric. For livestreams, metrics need to be computed during the transcode — see concepts/in-loop-quality-metrics.

The audio sibling is concepts/audio-codec: compress raw audio (e.g. 768 kbps PCM) down to 25–30 kbps for transmission. Meta's MLow codec is their proprietary response to the audio-transcoding problem at RTC scale; it is not part of the FFmpeg pipeline story — that is stored + livestream video, a separate domain.

Seen in

Last updated · 319 distilled / 1,201 read