Skip to content

CONCEPT Cited by 1 source

Long-Term Reference frame

Long-Term Reference (LTR) is an error-resilience mechanism where a video encoder stores selected reference frames in its buffer longer than regular references and can produce LTRP (LTR-Predicted) frames that reference them on demand. When the normal decode chain breaks due to packet loss, an LTRP frame predicted from a previously-decoded LTR instantly resynchronizes sender and receiver — recovering without the cost of a full keyframe.

Why LTR over keyframes

When packet loss breaks the prediction chain, the standard recovery is a keyframe. But keyframes are ~10× larger than typical P-frames, causing: - Network congestion from the bitrate spike. - More packet loss from the congestion. - A problematic feedback cycle of loss → keyframe → more loss.

LTR frames are more efficient: an LTRP is still an inter-frame (not intra-only), just predicted from an older, stable reference. The trade-off is weaker temporal correlation (older reference → less accurate motion prediction), which Meta mitigates by marking periodic higher-quality frames as LTR (Source: sources/2026-06-22-meta-adopting-av1-for-real-time-communication-rtc-at-scale).

Architecture (Meta AV1 RTC)

Encoder side

  • Encoder periodically emits LTR frames, pinning them in a bounded reference buffer of size 4.
  • When buffer is full, oldest LTR is evicted.
  • An explicit LTR indicator (binary flag in a proprietary RTP header extension) signals the network layer which frames are LTR — necessary because AV1 doesn't distinguish LTR/non-LTR in bitstream syntax (unlike H.264).
  • The frame_id is exposed to the network layer via LTR bitstream syntax.

Network feedback

  • ACK feedback via a separate proprietary RTP header extension, carrying the frame_id of the received LTR.
  • The encoder always uses the most recently ACKed LTR as prediction reference for LTRP frames.

Two recovery triggers

  1. Reactive recovery — Receiver experiences a freeze → sends RPSI (Reference Picture Selection Indication) → encoder produces LTRP.
  2. Proactive protection — Sender detects elevated packet loss via feedback channel → requests periodic LTRPs preemptively. Somewhat redundant but significantly reduces freezes.

If no ACKed LTR is available in the buffer, the encoder falls back to a keyframe.

Seen in

Last updated · 559 distilled / 1,651 read