CONCEPT Cited by 2 sources
Video transcoding¶
Definition¶
Video transcoding is the act of decoding a source video and re-encoding it to one or more target encodings, optionally changing resolution, codec, framerate, container format, or perceived quality level. It is the canonical operation of any video-serving infrastructure: user-uploaded content in an arbitrary codec/container/resolution mix cannot be served directly to every client, so the platform must transform it (Source: sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale).
The industry-standard toolchain is FFmpeg. At hyperscale, transcoding is a first-class compute cost: Meta alone runs FFmpeg / ffprobe invocations tens of billions of times per day, with > 1 billion video uploads per day each triggering multiple invocations.
Core operations¶
A transcoding job typically composes:
- Demux — read the container format, separate audio / video / subtitle streams.
- Decode — reverse the codec compression to raw frames (YUV for video, PCM for audio).
- Filter (optional) — scale, crop, rotate, colour correct, watermark.
- Encode — apply the target codec at the target bitrate/quality. This is usually the compute-dominant step.
- Mux — wrap the encoded streams into the target container (e.g. fMP4 for DASH segments).
Why multi-output pipelines matter¶
Any DASH / HLS / adaptive-streaming pipeline needs multiple output encodings from the same source — an encoding "ladder" of resolutions/codecs/bitrates that a player can switch between at runtime (see concepts/adaptive-bitrate-streaming-dash and concepts/multi-lane-encoding-pipeline). Running one FFmpeg process per output re-decodes the source each time; running one FFmpeg process with multiple outputs decodes once and shares frames across encoders (see patterns/deduplicate-decode-across-encoder-lanes).
Hardware acceleration¶
Both decoding and encoding can be offloaded to dedicated fixed-function hardware. FFmpeg exposes all such devices through a common abstraction — the hardware-accelerated video codec API — so pipeline code is largely the same whether the underlying silicon is NVIDIA NVENC/NVDEC, Intel Quick Sync Video, AMD UVD, or Meta's private MSVP ASIC.
Quality metrics¶
A transcode is a lossy transformation, so output quality is a first-class concern. Metrics like PSNR, SSIM, and VMAF compare decoded-pre-compression frames against decoded-post-compression frames to score perceived quality loss — see concepts/visual-quality-metric. For livestreams, metrics need to be computed during the transcode — see concepts/in-loop-quality-metrics.
Related audio analogue¶
The audio sibling is concepts/audio-codec: compress raw audio (e.g. 768 kbps PCM) down to 25–30 kbps for transmission. Meta's MLow codec is their proprietary response to the audio-transcoding problem at RTC scale; it is not part of the FFmpeg pipeline story — that is stored + livestream video, a separate domain.
Seen in¶
- sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale — canonical scale + architecture reference.
- sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening — Netflix's AV1 Film Grain Synthesis rollout. Transcoding is restructured as a three-stage denoise → encode → synthesize pipeline when the codec emits a side-channel of synthesis parameters — the compressed bitstream carries a denoised signal plus AR coefficients + scaling function; the decoder reconstructs the grain. Canonical wiki instance of decoder- side synthesis for compression.
Related¶
- concepts/multi-lane-encoding-pipeline
- concepts/adaptive-bitrate-streaming-dash
- concepts/visual-quality-metric
- concepts/hardware-accelerated-video-codec-api
- concepts/audio-codec
- concepts/film-grain-synthesis, concepts/denoise-encode-synthesize
- systems/ffmpeg, systems/meta-msvp, systems/av1-codec
- patterns/decoder-side-synthesis-for-compression