CONCEPT Cited by 1 source

Generative retrieval¶

Definition¶

Generative retrieval is a recommendation / search architecture where the retrieval stage generates the identifier of the next relevant item token-by-token via an autoregressive decoder, instead of scoring every candidate against the request. The item identifier is typically a sequence of codewords from a learned Semantic ID codebook (see RQ-VAE) that have prefix-sharing semantic similarity.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"We rebuilt the system, by moving from an encoder that scores products to a generative model that spells them out, token by token."

Scoring vs generation¶

Two architectures, two cost structures, two failure modes:

Axis	Scoring retrieval	Generative retrieval
Output shape	Probability over full vocabulary	Token-by-token sequence
Vocabulary	Atomic item IDs	Semantic IDs (codeword sequences)
Inference primitive	Top-K from scored vocabulary	Beam search over codeword positions
Vocabulary growth	Catalog-bounded (bottleneck)	Codebook-bounded
Cold-start (new items)	Hard — needs transaction history	Easy — codebook covers all items
Coherence within candidate set	Flat — laundry detergent in a breakfast cart	Hierarchical — autoregressive prefix conditioning constrains beam
Tunable dial	Top-K threshold	Beam width + temperature
Compute cost	O(vocab size) per request	O(beam_width × decode_steps × decoder_compute)
Production examples	Pinterest two-tower CGs, Meta SilverTorch (in-graph index)	TIGER (Google), Spotify GLIDE/NEO, YouTube PLUM, Instacart 2026-06

Why generation specifically¶

The 2026-06 Instacart source articulates three structural ceilings of scoring that generation dissolves:

Vocabulary bottleneck — scoring a fixed atomic-ID vocabulary forces a model-size / sparsity / coverage trade-off. "The model constructs the semantic representation of the next item on the fly, avoiding the memory and latency penalties that previously restricted our catalog coverage."
Cold-start hurdle — atomic-ID models memorise co-occurrences; new products without history can't be retrieved. Semantic IDs give every product a codebook position from day 1.
Structural drift — flat probability distributions across a heterogeneous vocabulary leak across semantic neighbourhoods. "Generating auto regressively means each codeword is explicitly conditioned on the previous one. This enforces a strict hierarchy during retrieval. If the model begins generating a prefix for 'Produce,' the beam search remains confined to that semantic neighborhood."

Tunable dials — beam width and temperature¶

A scoring model has one knob: top-K. A generative retriever has two:

Beam width controls how many candidate sequences are tracked at each decode step. Wider beam = more candidate diversity.
Temperature controls the entropy of the token distribution at each step. Higher temperature = more exploration; lower = more exploitation.

The wins compose: "Unlike scoring models, the generative approach unlocks direct tuning mechanisms through beam width and temperature sampling. These serve as precise levers to balance intent and exploration — allowing us to dial up strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces." — see concepts/diversity-via-beam-and-temperature.

Sibling architectures in the broader retrieval design space¶

Generative retrieval sits alongside three other retrieval paradigms on the wiki:

Two-tower / dual-encoder — asymmetric pre-compute: item embeddings indexed offline, query embedding computed once per request, scoring via dot-product over the ANN index.
Index as Model (Meta SilverTorch 2026-05-26) — items live as a tensor inside the retrieval model graph; the cross-service hop disappears but the scoring paradigm is preserved.
Sequence-model scoring — Pinterest contextual sequential CG, Instacart's prior CR — Transformer-based two-tower with sequence inputs but still output a probability distribution over the full atomic-ID vocabulary.
Generative retrieval — TIGER, Spotify GLIDE/NEO, YouTube PLUM, Instacart 2026-06 — abandons scoring entirely; recommendation becomes autoregressive sequence generation.

The 2026-05-26 SilverTorch source and the 2026-06-02 Instacart source are architecturally orthogonal alternatives to "score every item against the request": SilverTorch keeps two-tower asymmetric pre-compute but absorbs the index into the model graph; Instacart abandons two-tower / ANN entirely and replaces it with autoregressive generation.

When NOT to use generative retrieval¶

Conditions under which scoring retrieval remains the right choice:

Item identifiers don't have learnable structure. Generative retrieval depends on a meaningful codebook (RQ-VAE over rich item features). Without it, the codeword vocabulary is arbitrary and the prefix-sharing benefit disappears.
Latency budget is too tight for autoregressive decoding. Decoding cost scales with sequence length × beam width; for ultra-low-latency surfaces (sub-millisecond ad serving), scoring may still win.
Tail precision matters more than diversity. Generative retrieval shines when the request is broad and the win is brand / item diversity. For narrow-intent surgical retrieval (e.g. a specific search query for a known brand), scoring + reranking may still win.
No GPU serving substrate available. As Instacart explicitly notes, the legacy "Python and CPU inference" stack is "not viable" — generative retrieval requires a GPU stack.

Caveats¶

This is a young paradigm — TIGER paper is 2023; production deployments at Spotify / YouTube / Instacart all 2024-2026.
Long-term stability of the codebook across re-training / catalog drift is not yet well-characterised in published work.
The post-decode mapping layer (Instacart's retailer-partitioned index) is essential for generic SID-to-real-product attribution; without it, the generated SID could fan out to many products with no ranking discipline at the mapping layer.

Seen in¶

sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki disclosure of generative retrieval as a production-shipped paradigm at Instacart's scale; defines the "scoring vs spelling" framing.

systems/tiger-generative-retrieval — the reference paper.
systems/instacart-generative-ads-retrieval — wiki-disclosed production system.
systems/instacart-semantic-ids / systems/rq-vae — the vocabulary-substrate components.
concepts/semantic-id / concepts/beam-search-retrieval / concepts/vocabulary-bottleneck — sibling concepts.
concepts/two-tower-architecture / concepts/index-as-model / concepts/retrieval-ranking-funnel — alternative retrieval paradigms in the design space.
patterns/generative-over-scoring-retrieval — the canonical pattern.