Meta — FFmpeg at Meta: Media Processing at Scale¶
Summary¶
Meta Engineering's 2026-03-09 post describes how Meta has deprecated its long-standing internal FFmpeg fork for all DASH video-on-demand (VOD) and livestreaming pipelines, migrating to vanilla upstream FFmpeg 8.0. The migration was unblocked by two features Meta co-developed upstream with FFlabs and VideoLAN over multiple releases: (1) efficient threaded multi-lane transcoding (finalised in FFmpeg 6.0-8.0), which generates multiple encodings for an adaptive-streaming ladder from a single decode, and (2) in-loop quality metrics during transcoding (FFmpeg 7.0+), which insert a decoder after each encoder so reference metrics like PSNR/SSIM/VMAF can be computed in real time during livestreams. The post also explains the inverse decision: Meta keeps its MSVP (Meta Scalable Video Processor) ASIC patches internal, because MSVP hardware is Meta-only and FFmpeg developers cannot validate changes without it. The architectural frame is simple: Meta runs ffmpeg + ffprobe tens of billions of times per day, so per-process efficiency wins compound to fleet-level savings, and carrying an internal fork is a long-term liability worth spending years upstream to remove.
Key takeaways¶
- Invocation scale sets the architectural frame. "Meta executes ffmpeg (the main CLI application) and ffprobe (a utility for obtaining media file properties) binaries tens of billions of times a day, introducing unique challenges when dealing with media files." Any per-process overhead multiplied by 10^10 matters; that's why de-duplication (one decode → N encodes) and de-forkification (one upstream binary, not two) both pay off. (Source text; see concepts/video-transcoding)
- Upload scale motivates multi-lane efficiency. "Given that we process over 1 billion video uploads daily, each requiring multiple FFmpeg executions, reductions in per-process compute usage yield significant efficiency gains." Anchors patterns/deduplicate-decode-across-encoder-lanes. (Source text)
- The lane-per-process anti-pattern is explicit. "In a very simple system separate FFmpeg command lines can generate the encodings for each lane one-by-one in serial. This could be optimized by running each command in parallel, but this quickly becomes inefficient due to the duplicate work done by each process." Each process would re-decode the same source. (Source text; see concepts/multi-lane-encoding-pipeline)
- Single-process multi-output deduplicates the decode. "To work around this, multiple outputs could be generated within a single FFmpeg command line, decoding the frames of a video once and sending them to each output's encoder instance. This eliminates a lot of overhead by deduplicating the video decoding and process startup time overhead incurred by each command line." (Source text)
- Parallelised encoders were the Meta-fork-specific win until FFmpeg 6.0-8.0 upstreamed it. "Our internal FFmpeg fork provided an additional optimization to this: parallelized video encoding. While individual video encoders are often internally multi-threaded, previous FFmpeg versions executed each encoder in serial for a given frame when multiple encoders were in use. By running all encoder instances in parallel, better parallelism can be obtained overall." The FFmpeg community then delivered "the most complex refactoring of FFmpeg in decades" with contributions from FFlabs and VideoLAN, "starting with FFmpeg 6.0, with the finishing touches landing in 8.0." (Source text; see patterns/upstream-the-fix)
- Livestream quality metrics require in-loop decoding. "FFmpeg can compute various visual quality metrics such as PSNR, SSIM, and VMAF using two existing encodings in a separate command line after encoding has finished. This is okay for offline or VOD use cases, but not for livestreaming where we might want to compute quality metrics in real time." The mechanism: "insert a video decoder after each video encoder used by each output lane. These provide bitmaps for each frame in the video after compression has been applied so that we can compare against the frames before compression. In the end, we can produce a quality metric for each encoded lane in real time using a single FFmpeg command line." Upstreamed as "in-loop decoding" in FFmpeg 7.0+. (Source text; see concepts/in-loop-quality-metrics, patterns/in-loop-decoder-for-realtime-quality-metrics)
- Reference vs no-reference metrics, named. "These metrics are categorized as reference or no-reference metrics, where the former compares a reference encoding to some other distorted encoding." The post's metrics (PSNR / SSIM / VMAF) are all reference metrics — hence the need for the original bitmaps alongside the encoded outputs. (Source text; see concepts/visual-quality-metric)
- Hardware-accelerated codecs integrate through standard FFmpeg APIs. "FFmpeg supports hardware-accelerated decoding, encoding, and filtering with devices such as NVIDIA's NVDEC and NVENC, AMD's Unified Video Decoder (UVD), and Intel's Quick Sync Video (QSV). Each device is supported through an implementation of standard APIs in FFmpeg, allowing for easier integration and minimizing the need for device-specific command line flags. We've added support for the Meta Scalable Video Processor (MSVP), our custom ASIC for video transcoding, through these same APIs, enabling the use of common tooling across different hardware platforms with minimal platform-specific quirks." (Source text; see concepts/hardware-accelerated-video-codec-api)
- Infra-specific patches stay internal — intentionally. "As MSVP is only used within Meta's own infrastructure, it would create a challenge for FFmpeg developers to support it without access to the hardware for testing and validation. In this case, it makes sense to keep patches like this internal since they wouldn't provide benefit externally. We've taken on the responsibility of rebasing our internal patches onto more recent FFmpeg versions over time, utilizing extensive validation to ensure robustness and correctness during upgrades." The complement to the upstreaming decision: when the benefit doesn't generalise, the fork cost is accepted. (Source text; see patterns/keep-infrastructure-specific-patches-internal)
- Fork deprecation is the payoff. "With more efficient multi-lane encoding and real-time quality metrics, we were able to fully deprecate our internal FFmpeg fork for all VOD and livestreaming pipelines." The fork lived "for many years"; multi-year upstream collaboration was the exit cost. Ongoing commitment: "We plan to continue investing in FFmpeg in partnership with open source developers, bringing benefits to Meta, the wider industry, and people who use our products." (Source text)
- Reverse-rebase is the accepted cost of a private fork. "We've taken on the responsibility of rebasing our internal patches onto more recent FFmpeg versions over time, utilizing extensive validation to ensure robustness and correctness during upgrades." This is the baseline operational cost of any internal patch that doesn't make sense to upstream; Meta's MSVP integration inherits it because of the hardware-access gap. (Source text)
Systems extracted¶
- systems/ffmpeg — the open-source multimedia CLI Meta executes tens of billions of times per day. Upstream features Meta co-developed or depends on: threaded multi-lane transcoding (6.0-8.0), in-loop decoding for real-time quality metrics (7.0+). 25+ years of active development; the de facto industry standard for media processing.
- systems/ffprobe — FFmpeg's companion utility for inspecting media file properties (codec, container, stream metadata). Meta runs it at the same invocation volume as
ffmpeg. - systems/meta-msvp — the Meta Scalable Video Processor, Meta's custom video-transcoding ASIC. Integrated into Meta's internal FFmpeg via the same hardware-acceleration APIs that upstream uses for NVDEC/NVENC/UVD/QSV; patches kept internal because external FFmpeg developers cannot test against MSVP hardware.
- systems/nvidia-nvenc-nvdec — stub. NVIDIA's hardware video encode/decode engines exposed on GeForce/Tesla/Datacenter GPUs. Named in the post as one of the hardware-accelerator classes FFmpeg already supports via standard APIs.
- systems/intel-quick-sync-video — stub. Intel's integrated hardware media engine (QSV) present on most modern Intel client + server CPUs. Named in the post as a second pre-existing hardware-accel target FFmpeg supports via standard APIs.
Concepts extracted¶
New:
- concepts/video-transcoding — decode a source video and re-encode it to one or more output encodings, optionally changing resolution / codec / framerate / quality level. The general primitive FFmpeg automates.
- concepts/adaptive-bitrate-streaming-dash — Dynamic Adaptive Streaming over HTTP: a player dynamically selects between multiple pre-encoded renditions of the same video based on network conditions. Requires a multi-lane encoding ladder at production time.
- concepts/multi-lane-encoding-pipeline — architectural shape of a video transcoding pipeline that produces multiple outputs from one source (the "ladder"). Lanes differ in resolution / codec / framerate / quality; all are derived from the same decoded frames.
- concepts/in-loop-quality-metrics — computing perceptual quality metrics during transcoding (not after) by inserting a decoder after each encoder, so frames-before-compression can be compared against frames-after-compression for each lane in a single FFmpeg command.
- concepts/visual-quality-metric — numeric representation of perceived visual quality loss from compression; reference metrics (PSNR, SSIM, VMAF) compare a reference to a distorted encoding; no-reference metrics score a single encoding.
- concepts/hardware-accelerated-video-codec-api — FFmpeg's pattern of exposing each hardware encoder/decoder (NVENC, NVDEC, UVD, QSV, MSVP) through a shared standardised API, so pipelines can target the hardware with minimal device-specific flags.
Patterns extracted¶
New:
- patterns/deduplicate-decode-across-encoder-lanes — one FFmpeg command, one decoder instance, N parallel encoder instances feeding a DASH ladder. Eliminates per-lane decode + process-startup overhead; enables per-frame encoder parallelism. Canonical Meta-driven FFmpeg upstream win (6.0 → 8.0).
- patterns/in-loop-decoder-for-realtime-quality-metrics — insert a decoder after each encoder in a multi-lane pipeline, compare pre- vs post-compression bitmaps, emit reference quality metrics live for each lane. Unblocks VMAF/SSIM/PSNR for livestreams where post-hoc comparison is not an option.
- patterns/keep-infrastructure-specific-patches-internal — the explicit complement to upstream the fix: when a patch is tied to infrastructure external contributors can't test (here: a Meta-only ASIC), keeping it internal is correct, and the reverse-rebase cost against newer upstream releases is the operational price of that choice.
Updated:
- patterns/upstream-the-fix — extended with Meta × FFmpeg (6.0 → 8.0) as a new canonical multi-year, multi-party (Meta + FFlabs + VideoLAN) instance of upstreaming a fork's load-bearing features so the fork can be retired.
Operational numbers¶
- FFmpeg/ffprobe invocations: tens of billions per day.
- Video uploads processed daily: > 1 billion, each requiring multiple FFmpeg executions.
- FFmpeg versions involved: 6.0 → 8.0 for multi-lane threading; 7.0+ for in-loop decoding.
- FFmpeg project age: 25+ years of active development.
- Internal fork status: fully deprecated for VOD + livestreaming pipelines as of this post; MSVP-specific patches remain internal.
Caveats¶
- No specific fleet sizes, CPU-seconds-saved, or before/after efficiency numbers for the multi-lane win — the post argues the case qualitatively.
- No specifics on how many DASH lanes Meta encodes per upload, codec mix, or target bitrate ladder.
- MSVP is linked but not described in this post; MSVP's own Meta AI post (meta-scalable-video-processor-MSVP) is the canonical reference and is not ingested in this wiki yet.
- Quality-metric numbers (what PSNR/SSIM/VMAF thresholds Meta actually alerts on, per-lane or per-codec) are not disclosed.
- The post doesn't enumerate which codecs Meta uses (H.264, H.265/HEVC, AV1); the architecture is codec-agnostic by design but the real mix isn't shared.
- Open-source contribution scope is named generically ("FFmpeg developers, including those at FFlabs and VideoLAN"); individual patch series / mailing-list threads are not linked except for one X/Twitter post pointing at the threading refactor's complexity.
- "End-to-end encryption" is not relevant here (this is unicast/broadcast video infrastructure, not messaging); unlike MLow or Private Processing, privacy/E2EE posture is out of scope for this post.
Source¶
- Original: https://engineering.fb.com/2026/03/02/video-engineering/ffmpeg-at-meta-media-processing-at-scale/
- Raw markdown:
raw/meta/2026-03-09-ffmpeg-at-meta-media-processing-at-scale-23756d8e.md
Related¶
- companies/meta — 16th Meta source on the wiki; opens video-transcoding-infrastructure as a new technical domain distinct from the prior MLow audio codec (RTC audio) and the storage/GenAI/privacy/source-control corpus.
- patterns/upstream-the-fix — this post is a second-generation reinforcement after the Cloudflare V8 / Node.js / OpenNext instances: where Cloudflare's examples were single targeted PRs, Meta's FFmpeg story is a multi-year multi-release refactor culminating in fork retirement, the most architecturally consequential outcome of the pattern.
- patterns/keep-infrastructure-specific-patches-internal — the new complement pattern the post introduces alongside upstream-the-fix; together they form the framework Meta uses to decide each patch's destiny.
- sources/2024-06-13-meta-mlow-metas-low-bitrate-audio-codec — sibling Meta media-codec post: MLow is RTC audio, this post is stored/live video. Both care about compute efficiency at scale, but MLow ships a proprietary codec whereas this post tells the opposite story — deprecating a proprietary fork in favour of upstream.