Skip to content

CONCEPT Cited by 1 source

Auto-regressive grain model

Definition

An auto-regressive (AR) grain model is a compact parametric generator for noise patterns used by AV1 Film Grain Synthesis. A small set of AR coefficients {a₀, a₁, a₂, …, aₙ} — plus a white-Gaussian-noise drive — describes how each noise sample is generated as a linear combination of previously- synthesized neighbours. From those coefficients the decoder generates a 64×64 noise template which is then tiled onto decoded frames in random 32×32 patches (Source: sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening).

AR models are not new to signal processing; what is architecturally significant is that transmitting the AR coefficients across the wire instead of the grain texture itself costs kilobytes instead of megabytes — the compression win that makes film-grain synthesis viable.

Mechanics (per Netflix Fig. 1)

The simplest AR kernel uses lag parameter L=1: each noise value is calculated as "a linear combination of previously synthesized noise sample values, with AR coefficients a₀, a₁, a₂, a₃ and a white Gaussian noise (wgn) component". Higher lag values L=2, L=3 … use more neighbouring samples and produce finer or more structured grain patterns. The encoder chooses an AR order appropriate for the grain texture of the source content.

                          (simplified AR kernel, L=1)
  ┌───┬───┬───┐
  │ a₂│ a₁│ a₂│           wgn ────► Σ ────► n(x, y)
  ├───┼───┼───┤                     ▲
  │ a₃│ • │   │                     │
  └───┴───┴───┘           a₀·n(x-1,y-1) + a₁·n(x,y-1) + a₂·n(x+1,y-1) + a₃·n(x-1,y)
     template

The cell is the sample currently being generated; the a_i cells are previously-synthesized neighbours weighted by the transmitted coefficients. The encoder-side task is to estimate AR coefficients such that the synthesized noise template has the spatial correlation structure of the source-video grain — that is, that the grain looks statistically like what was filmed.

Estimation from the residual

The AR coefficients are estimated on the encoder side from the residual between the source video and the denoised video: source − denoised = captured grain. The AR model is fit to that captured grain (Source: sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening). This makes the AR model a statistical description of the grain in the source — a compressed summary of what kind of noise is there, not which exact noise sample was there.

Two consequences:

  • The synthesized grain is not sample-wise identical to the source grain. It is statistically indistinguishable to the human eye (that is the design intent) but reference metrics like VMAF / PSNR / SSIM — see concepts/visual-quality-metric — do not know that and will score it as distorted.
  • The grain parameters can be re-used across frames as long as the source grain remains statistically stationary. The AR model captures a style of grain, not a per-frame instance.

Why this compresses so well

The compression win comes from transmitting the generator instead of the signal:

Representation Approx size per title
Grain as encoded texture megabytes
Grain as AR coefficients + scaling function kilobytes

And the AR-generated synthesized grain, tiled from a 64×64 template in 32×32 patches, is cheap to evaluate on commodity decoders — the decoder-side cost of grain reconstruction is a first-class design constraint. See concepts/grain-intensity-scaling-function for the companion intensity model, and patterns/decoder-side-synthesis-for-compression for the general architectural pattern.

Seen in

Last updated · 319 distilled / 1,201 read