CONCEPT Cited by 1 source
Generative retrieval¶
Definition¶
Generative retrieval is a recommendation / search architecture where the retrieval stage generates the identifier of the next relevant item token-by-token via an autoregressive decoder, instead of scoring every candidate against the request. The item identifier is typically a sequence of codewords from a learned Semantic ID codebook (see RQ-VAE) that have prefix-sharing semantic similarity.
Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):
"We rebuilt the system, by moving from an encoder that scores products to a generative model that spells them out, token by token."
Scoring vs generation¶
Two architectures, two cost structures, two failure modes:
| Axis | Scoring retrieval | Generative retrieval |
|---|---|---|
| Output shape | Probability over full vocabulary | Token-by-token sequence |
| Vocabulary | Atomic item IDs | Semantic IDs (codeword sequences) |
| Inference primitive | Top-K from scored vocabulary | Beam search over codeword positions |
| Vocabulary growth | Catalog-bounded (bottleneck) | Codebook-bounded |
| Cold-start (new items) | Hard — needs transaction history | Easy — codebook covers all items |
| Coherence within candidate set | Flat — laundry detergent in a breakfast cart | Hierarchical — autoregressive prefix conditioning constrains beam |
| Tunable dial | Top-K threshold | Beam width + temperature |
| Compute cost | O(vocab size) per request | O(beam_width × decode_steps × decoder_compute) |
| Production examples | Pinterest two-tower CGs, Meta SilverTorch (in-graph index) | TIGER (Google), Spotify GLIDE/NEO, YouTube PLUM, Instacart 2026-06 |
Why generation specifically¶
The 2026-06 Instacart source articulates three structural ceilings of scoring that generation dissolves:
- Vocabulary bottleneck — scoring a fixed atomic-ID vocabulary forces a model-size / sparsity / coverage trade-off. "The model constructs the semantic representation of the next item on the fly, avoiding the memory and latency penalties that previously restricted our catalog coverage."
- Cold-start hurdle — atomic-ID models memorise co-occurrences; new products without history can't be retrieved. Semantic IDs give every product a codebook position from day 1.
- Structural drift — flat probability distributions across a heterogeneous vocabulary leak across semantic neighbourhoods. "Generating auto regressively means each codeword is explicitly conditioned on the previous one. This enforces a strict hierarchy during retrieval. If the model begins generating a prefix for 'Produce,' the beam search remains confined to that semantic neighborhood."
Tunable dials — beam width and temperature¶
A scoring model has one knob: top-K. A generative retriever has two:
- Beam width controls how many candidate sequences are tracked at each decode step. Wider beam = more candidate diversity.
- Temperature controls the entropy of the token distribution at each step. Higher temperature = more exploration; lower = more exploitation.
The wins compose: "Unlike scoring models, the generative approach unlocks direct tuning mechanisms through beam width and temperature sampling. These serve as precise levers to balance intent and exploration — allowing us to dial up strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces." — see concepts/diversity-via-beam-and-temperature.
Sibling architectures in the broader retrieval design space¶
Generative retrieval sits alongside three other retrieval paradigms on the wiki:
- Two-tower / dual-encoder — asymmetric pre-compute: item embeddings indexed offline, query embedding computed once per request, scoring via dot-product over the ANN index.
- Index as Model (Meta SilverTorch 2026-05-26) — items live as a tensor inside the retrieval model graph; the cross-service hop disappears but the scoring paradigm is preserved.
- Sequence-model scoring — Pinterest contextual sequential CG, Instacart's prior CR — Transformer-based two-tower with sequence inputs but still output a probability distribution over the full atomic-ID vocabulary.
- Generative retrieval — TIGER, Spotify GLIDE/NEO, YouTube PLUM, Instacart 2026-06 — abandons scoring entirely; recommendation becomes autoregressive sequence generation.
The 2026-05-26 SilverTorch source and the 2026-06-02 Instacart source are architecturally orthogonal alternatives to "score every item against the request": SilverTorch keeps two-tower asymmetric pre-compute but absorbs the index into the model graph; Instacart abandons two-tower / ANN entirely and replaces it with autoregressive generation.
When NOT to use generative retrieval¶
Conditions under which scoring retrieval remains the right choice:
- Item identifiers don't have learnable structure. Generative retrieval depends on a meaningful codebook (RQ-VAE over rich item features). Without it, the codeword vocabulary is arbitrary and the prefix-sharing benefit disappears.
- Latency budget is too tight for autoregressive decoding. Decoding cost scales with sequence length × beam width; for ultra-low-latency surfaces (sub-millisecond ad serving), scoring may still win.
- Tail precision matters more than diversity. Generative retrieval shines when the request is broad and the win is brand / item diversity. For narrow-intent surgical retrieval (e.g. a specific search query for a known brand), scoring + reranking may still win.
- No GPU serving substrate available. As Instacart explicitly notes, the legacy "Python and CPU inference" stack is "not viable" — generative retrieval requires a GPU stack.
Caveats¶
- This is a young paradigm — TIGER paper is 2023; production deployments at Spotify / YouTube / Instacart all 2024-2026.
- Long-term stability of the codebook across re-training / catalog drift is not yet well-characterised in published work.
- The post-decode mapping layer (Instacart's retailer-partitioned index) is essential for generic SID-to-real-product attribution; without it, the generated SID could fan out to many products with no ranking discipline at the mapping layer.
Seen in¶
- sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki disclosure of generative retrieval as a production-shipped paradigm at Instacart's scale; defines the "scoring vs spelling" framing.
Related¶
- systems/tiger-generative-retrieval — the reference paper.
- systems/instacart-generative-ads-retrieval — wiki-disclosed production system.
- systems/instacart-semantic-ids / systems/rq-vae — the vocabulary-substrate components.
- concepts/semantic-id / concepts/beam-search-retrieval / concepts/vocabulary-bottleneck — sibling concepts.
- concepts/two-tower-architecture / concepts/index-as-model / concepts/retrieval-ranking-funnel — alternative retrieval paradigms in the design space.
- patterns/generative-over-scoring-retrieval — the canonical pattern.