Skip to content

CONCEPT Cited by 1 source

AV1 layered coding

Definition

AV1 layered coding is a main-profile AV1 feature that lets a single bitstream carry multiple layers of video content, where each layer is independently decodable and can be added or swapped at delivery time without re-encoding the other layers.

The canonical use case Netflix is actively evaluating (Source: sources/2025-12-05-netflix-av1-now-powering-30-of-netflix-streaming) is live-sports graphics overlay: main content in the base layer, graphics overlay in the enhancement layer, with the enhancement layer swapped per market / per sponsor / per language without re-encoding the base layer.

From the Netflix post: "Layered coding is supported in AV1's main profile, allowing encoding the main content in the base layer, and graphics in the enhancement layer, and easily swapping out one version of the enhancement layer with another. We envision that the use of AV1's layered coding can greatly simplify the live streaming workflow and reduce delivery costs."

What layered coding buys you

Without layered coding, graphics overlay for live sports is done one of three ways, all expensive:

  1. Bake the overlay into the source before encoding. One encode per (content × overlay-variant) pair. For N markets × M sponsors × L languages you get N·M·L encodes. At live-event scale this explodes quickly — e.g. 10 markets × 3 sponsor rotations × 5 language combos = 150 encodes per live stream, each at the full ladder of resolutions and bitrates.
  2. Render the overlay client-side after decode. Needs metadata channels for overlay positioning + content, and every client device has to implement the overlay renderer consistently. Has been tried at various services with mixed device-coverage outcomes.
  3. Split stream into two HLS/DASH streams and have the player composite. Expensive to deliver (double the segment count), fragile under ABR.

With layered coding on AV1:

  • One base-layer encode per content variant (the actual game / event).
  • Many cheap enhancement-layer encodes — overlay is usually small graphics over a transparent field, so encode cost is a fraction of the base layer.
  • Swap at delivery time — CDN or origin selects the enhancement layer based on request context (market, language, subscription tier) and concatenates the chosen enhancement-layer bitstream with the single base-layer bitstream into the delivered segment.
  • Collapses the N·M·L encode matrix to N + (M·L or similar) — linear in the number of variants rather than multiplicative.

Why this is hard without layered coding

Arbitrary "swap out one version of the enhancement layer with another" requires the base and enhancement layers to be decoder-addressable as separate streams within the bitstream. The codec has to natively support emitting, carrying, and independently decoding multiple layers.

AV1's main profile does support this — it's inherited from the scalable-video-coding lineage (SVC in H.264, SHVC in HEVC) but with AV1's efficiency baseline.

Layered coding on AV1 can in principle carry:

  • Temporal layers — frame-rate scalability (base layer 30fps + enhancement to 60fps).
  • Spatial layers — resolution scalability (base layer 720p + enhancement to 1080p or 4K).
  • Quality (SNR) layers — bitrate scalability (lower quality base + enhancement to higher quality).
  • Auxiliary-content layers — the graphics-overlay case Netflix is actively evaluating.

Netflix's 2025-12 post focuses on the auxiliary-content case as the live-streaming opportunity. The temporal / spatial / quality scalability cases are in-scope for cloud gaming (see systems/av1-codec → cloud-gaming section) where a network-responsive rung of the ladder in the same bitstream would reduce handshake overhead for ABR switches.

Live streaming economics

The post explicitly names the two structural wins for live streaming at Netflix scale:

  • Hyper-scale concurrent viewership — AV1's baseline compression efficiency reduces the bandwidth required per viewer (third less than HEVC per the fleet-wide datum; see concepts/rebuffering-rate), which matters when tens of millions of concurrent viewers are pulling the stream.
  • Customisable graphics overlay — layered coding lets the same underlying event feed carry different overlays per market / per sponsor / per context without paying the N·M·L encode blowup.

The result is "greatly simplif[ying] the live streaming workflow and reduc[ing] delivery costs". Netflix frames this as architectural direction, not shipped feature — see patterns/layered-coding-for-graphics-overlay.

Generalisation

The pattern — carry multiple independently-addressable content layers in one bitstream so a CDN / delivery layer can swap them per-request — generalises beyond live sports:

  • Localisation layers — base audio in one language, enhancement audio per language (audio codecs have a parallel mechanism).
  • Branding layers — base content + per-distributor logo / bug / watermark overlay.
  • Interactive overlays — base content + viewer-chosen overlay variant (stats in / stats out, PiP camera / no PiP).

The common shape is that CDN-side swap of a small enhancement layer over a shared base layer beats both client-side rendering (device coverage problem) and per-variant re-encoding (cost explosion).

Seen in

Last updated · 319 distilled / 1,201 read