CONCEPT Cited by 3 sources

Visual quality metric¶

Definition¶

A visual quality metric gives a numeric score of the perceived visual quality of a video frame or stream — in practice, how much perceptual quality has been lost relative to some reference.

Two categories (Source: sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale):

Reference metrics. Compare a reference encoding to a distorted encoding frame-by-frame. Require both streams to be available at comparison time. Named in the Meta post: PSNR, SSIM, VMAF.
No-reference metrics. Score a single encoding without access to the source. Useful for black-box assessments; less accurate than reference metrics.

Canonical reference metrics¶

PSNR (Peak Signal-to-Noise Ratio). Signal-processing classic; decibel-scale ratio of peak signal to reconstruction error. Easy to compute, but a weak predictor of perceived quality on many distortion classes.
SSIM (Structural Similarity). Compares structural information (luminance, contrast, structure) between reference and distorted. Better perceptual correlation than PSNR on typical artifacts.
VMAF (Video Multi-method Assessment Fusion). Netflix- originated machine-learning ensemble over multiple low-level features, trained against human MOS ratings on diverse content. State-of-the-art reference metric for streaming video quality.

Why reference metrics force pipeline design¶

Reference metrics need both source frames and encoded frames at comparison time. For VOD post-hoc: trivially available (two completed files). For livestreaming: not available after the fact — the encode is still streaming out — so the pipeline has to produce and compare frames during the encode. That's what concepts/in-loop-quality-metrics and patterns/in-loop-decoder-for-realtime-quality-metrics exist to solve.

Where reference metrics break down: synthesis-based tools¶

Reference metrics assume the decoded output is meant to be a sample-wise reconstruction of the source. That assumption fails for codec tools that emit parametric reconstructions instead of compressed signals. The canonical case on this wiki is AV1 Film Grain Synthesis (FGS) (Source: sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening): the encoder denoises the source, compresses the clean signal, and transmits AR coefficients + a piecewise-linear scaling function as grain metadata. The decoder re-synthesizes the grain from those parameters. The synthesized grain is statistically similar to the source grain but not sample-wise identical — it is a new noise instance drawn from the same AR model.

Reference metrics comparing decoded-against-source frame-by- frame will therefore score FGS output as "heavily distorted" even when viewers cannot tell the synthesized output apart from the original. Netflix's at-scale-FGS post hedges its quality claim accordingly, speaking of "high-quality video with less data while preserving the artistic integrity of film grain" rather than of higher VMAF.

Alternatives teams use to evaluate synthesis-based codec tools:

Perceptual side-by-side comparisons with human viewers.
Denoised-signal reference metrics — compare decoded-clean against source-denoised (both sides of the encoder denoise stage), isolating the compression quality from the grain substitution.
No-reference quality models scoring the synthesized component on its own statistical characteristics (spectral density, intensity-vs-brightness curves) without reference to the original sample grain.

The Netflix post does not specify which of these Netflix uses internally. See patterns/decoder-side-synthesis-for-compression for why this evaluation problem generalises beyond FGS.

Relationship to the audio metric sibling¶

The audio counterpart is POLQA MOS — see concepts/polqa-mos-metric — a reference-style Mean-Opinion-Score regression on speech quality. Both video and audio fields maintain reference metrics because human perception is the ground truth, and a metric that can be verified against the original is more trustworthy than one that has to guess a reference.

Seen in¶

sources/2026-03-09-meta-ffmpeg-at-meta-media-processing-at-scale — canonical Meta reference for PSNR/SSIM/VMAF in production video pipelines, including the in-loop livestream variant.
sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening — canonical wiki source for why reference metrics break down on synthesis-based codec tools. FGS-synthesized output is sample-wise different from the source grain even when perceptually equivalent.
sources/2026-04-02-netflix-smarter-live-streaming-vbr-at-scale — canonical wiki source for VMAF as the rung-by-rung decision metric for bitrate ladder re-tuning when switching encoder rate-control modes. Netflix's Live CBR → capped VBR cutover: "we compared CBR and VBR encodes rung by rung and looked at per-stream VMAF. Wherever VBR fell more than about one VMAF point below CBR, we increased its nominal bitrate just enough to close the gap." Offline VMAF analysis + production A/B both agreed on the ≈1-point low-rung regression; the ≈1-point threshold was the trigger for nominal-bitrate lift. See patterns/vmaf-rung-matched-ladder-tuning.