CONCEPT Cited by 1 source
Auto-regressive grain model¶
Definition¶
An auto-regressive (AR) grain model is a compact
parametric generator for noise patterns used by
AV1 Film Grain Synthesis.
A small set of AR coefficients {a₀, a₁, a₂, …, aₙ} —
plus a white-Gaussian-noise drive — describes how each noise
sample is generated as a linear combination of previously-
synthesized neighbours. From those coefficients the decoder
generates a 64×64 noise template which is then tiled onto
decoded frames in random 32×32 patches (Source:
sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening).
AR models are not new to signal processing; what is architecturally significant is that transmitting the AR coefficients across the wire instead of the grain texture itself costs kilobytes instead of megabytes — the compression win that makes film-grain synthesis viable.
Mechanics (per Netflix Fig. 1)¶
The simplest AR kernel uses lag parameter L=1: each noise value is calculated as "a linear combination of previously synthesized noise sample values, with AR coefficients a₀, a₁, a₂, a₃ and a white Gaussian noise (wgn) component". Higher lag values L=2, L=3 … use more neighbouring samples and produce finer or more structured grain patterns. The encoder chooses an AR order appropriate for the grain texture of the source content.
(simplified AR kernel, L=1)
┌───┬───┬───┐
│ a₂│ a₁│ a₂│ wgn ────► Σ ────► n(x, y)
├───┼───┼───┤ ▲
│ a₃│ • │ │ │
└───┴───┴───┘ a₀·n(x-1,y-1) + a₁·n(x,y-1) + a₂·n(x+1,y-1) + a₃·n(x-1,y)
template
The • cell is the sample currently being generated; the
a_i cells are previously-synthesized neighbours weighted by
the transmitted coefficients. The encoder-side task is to
estimate AR coefficients such that the synthesized noise
template has the spatial correlation structure of the
source-video grain — that is, that the grain looks
statistically like what was filmed.
Estimation from the residual¶
The AR coefficients are estimated on the encoder side from the residual between the source video and the denoised video: source − denoised = captured grain. The AR model is fit to that captured grain (Source: sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening). This makes the AR model a statistical description of the grain in the source — a compressed summary of what kind of noise is there, not which exact noise sample was there.
Two consequences:
- The synthesized grain is not sample-wise identical to the source grain. It is statistically indistinguishable to the human eye (that is the design intent) but reference metrics like VMAF / PSNR / SSIM — see concepts/visual-quality-metric — do not know that and will score it as distorted.
- The grain parameters can be re-used across frames as long as the source grain remains statistically stationary. The AR model captures a style of grain, not a per-frame instance.
Why this compresses so well¶
The compression win comes from transmitting the generator instead of the signal:
| Representation | Approx size per title |
|---|---|
| Grain as encoded texture | megabytes |
| Grain as AR coefficients + scaling function | kilobytes |
And the AR-generated synthesized grain, tiled from a 64×64 template in 32×32 patches, is cheap to evaluate on commodity decoders — the decoder-side cost of grain reconstruction is a first-class design constraint. See concepts/grain-intensity-scaling-function for the companion intensity model, and patterns/decoder-side-synthesis-for-compression for the general architectural pattern.
Seen in¶
- sources/2025-07-03-netflix-av1scale-film-grain-synthesis-the-awakening — canonical wiki source; includes Fig. 1 walkthrough of the AR-kernel synthesis process at lag L=1.