PATTERN Cited by 1 source
VMAF rung-matched ladder tuning¶
Problem¶
When a streaming service changes its encoder rate-control mode — e.g. CBR → capped VBR — and blindly re-uses the existing bitrate ladder's per-rung nominal bitrates, quality can silently drop on a subset of rungs, typically the lowest ones. The cause is the nominal-bitrate semantic shift between modes: a 5 Mbps CBR rung emits ≈5 Mbps second-to-second, but a 5 Mbps-nominal VBR rung often averages well below 5 Mbps because the encoder is targeting quality, not bytes (Source: sources/2026-04-02-netflix-smarter-live-streaming-vbr-at-scale).
Netflix's Live CBR → VBR cutover hit this concretely:
"When we first applied VBR using the existing CBR ladder, offline analysis with VMAF (a perceptual video quality metric) confirmed the concern from the WWE example: time-averaged quality dropped slightly on a few streams, especially at the lowest bitrates. Early A/B tests showed the same pattern: overall VMAF about one point lower than CBR, with most of the gap at the bottom of the ladder."
The low rungs are hit worst because they have the least headroom between target quality and compression floor — when VBR spends less on the easy scenes, it can't claw quality back via per-scene over-spend the way the higher rungs can.
Solution¶
Re-tune the ladder rung by rung against VMAF. For each rung:
- Encode representative content with the new rate- control mode (capped VBR) at the old rung's nominal.
- Compute per-stream VMAF against the old (CBR) rung as the reference point.
- If VBR falls more than ≈1 VMAF point below CBR on that rung, raise the rung's VBR nominal bitrate just enough to close the gap.
- Leave rungs that already match CBR quality at their old nominals.
Netflix's execution (Source: sources/2026-04-02-netflix-smarter-live-streaming-vbr-at-scale):
"To fix this, we compared CBR and VBR encodes rung by rung and looked at per-stream VMAF. Wherever VBR fell more than about one VMAF point below CBR, we increased its nominal bitrate just enough to close the gap. Higher-bitrate streams, where VBR quality was already very close to CBR, were left largely unchanged, including the 8 Mbps stream from the figure."
Why "just enough" matters¶
Over-lifting nominal bitrates would erase the efficiency gain that motivated the rate-control change in the first place. The discipline is to close the quality gap and nothing more:
- Rungs with no measurable VMAF regression → no change.
- Rungs with ≤1 VMAF point regression → no change (within noise / not worth the bitrate cost).
- Rungs with > 1 VMAF point regression → lift nominal until the regression is ≤ ≈1 VMAF point again.
The result is a ladder where quality matches the CBR baseline rung-by-rung but average bytes per title still drop because VBR spends fewer bits on easy scenes across all rungs.
Why offline VMAF + production A/B both¶
Netflix used offline VMAF analysis (on reference encodes of representative content) to propose the per-rung lifts, and production A/B tests on different Live events to verify that the proposed lifts behaved as expected on real traffic (Source: sources/2026-04-02-netflix-smarter-live-streaming-vbr-at-scale). Offline analysis catches the systematic effect cheaply; A/B catches content + network-distribution specifics the offline harness can miss.
When to apply¶
- Any rate-control-mode migration on a production bitrate ladder (CBR → VBR, VBR → capped-VBR, capped-VBR → CAE / per-shot, codec version bump).
- Any time the "nominal bitrate" semantic changes between old and new encoder (e.g. switching from average-target to quality-target objectives).
- Codec swaps where old and new codecs have different quality-at-matched-bitrate curves and the ladder structure is carried over.
What this requires¶
- A reference quality metric — VMAF is the default for streaming video; PSNR / SSIM are weaker substitutes (see concepts/visual-quality-metric).
- Representative content for offline runs that covers the dynamic range the ladder needs to perform on (easy, medium, hard).
- A/B infrastructure to validate under real traffic.
Caveats¶
- Reference metrics break down on synthesis-based codec tools like AV1 Film Grain Synthesis — see concepts/visual-quality-metric. For those tools, perceptual side-by-side comparisons or denoised-signal reference metrics are required.
- The ≈1-point threshold is Netflix's choice, not a universal constant. Different services tune differently based on how much regression they accept in exchange for how much bitrate.
- Per-title optimised ladders complicate this: if each title has its own ladder, rung-by-rung tuning becomes rung-by-rung-per-content-class tuning.
Seen in¶
- sources/2026-04-02-netflix-smarter-live-streaming-vbr-at-scale — canonical wiki source; Netflix Live CBR → capped VBR ladder re-tuning using rung-by-rung VMAF with ≈1-point gap threshold; offline analysis + production A/B agreed.