CONCEPT Cited by 1 source
Diversity via beam width and temperature¶
Definition¶
In generative retrieval systems, beam width and temperature are runtime knobs that let one model serve multiple surfaces with different precision-vs-exploration trade-offs without retraining. Wider beam = more candidate diversity. Higher temperature = more exploration in the per-step token distribution.
Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):
"Unlike scoring models, the generative approach unlocks direct tuning mechanisms through beam width and temperature sampling. These serve as precise levers to balance intent and exploration — allowing us to dial up strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces."
Why this is a structural advantage¶
A scoring retrieval model has one knob: top-K threshold. Two fundamental limitations follow:
- Top-K cannot encode exploration. It is a precision-truncation knob, not an exploration knob. Scoring more candidates surfaces more of the same items, not different categories of items.
- Per-surface tuning requires per-surface models. If search needs precision and post-checkout needs exploration, scoring retrieval requires two trained models (or at minimum two re-ranking heads).
Generative retrieval, by exposing two orthogonal runtime knobs — beam width (output-side) and temperature (per-step distribution shaping) — lets the same model serve surfaces with different intent profiles:
| Surface | Intent profile | Beam width | Temperature |
|---|---|---|---|
| Search page | Strict precision (user knows what they want) | Narrow | Low |
| Retailer home page | Broad discovery | Wider | Moderate |
| Pre-checkout | Cart-completion + brand-exploration | Wider | Higher |
| Post-checkout | Maximum brand diversity / exploration | Widest | Highest |
(Specific values per surface are not disclosed in the source.)
How the two knobs differ¶
Beam width is a structural knob — it changes how many parallel paths the decoder explores at each step. Wider beam:
- Produces more distinct full SID sequences per request.
- Each sequence is still locally optimal at each step.
- Increases compute cost roughly linearly.
Temperature is a distribution-shaping knob — it changes the softmax temperature applied to the per-step logits. Higher temperature:
- Flattens the distribution at each step, so lower-probability options become more likely to be selected into the beam.
- Doesn't change the number of sequences explored, but does change which sequences make it into the beam.
- Adds essentially no compute cost.
Composing them: low temperature + wide beam = many paths through the high-probability region (deep precision-focused exploration). High temperature + narrow beam = few paths through a broader region of the distribution (random walk-style exploration). High temperature + wide beam = both, maximum diversity.
Operational outcome at Instacart¶
The 2026-06 source attributes the 2.7× more brands and 1.8× more sub-categories retrieval-diversity wins (and the +421% Alcohol / +396% Beverages / +229% Healthcare category-conditional wins) to the structural property that generative retrieval allows direct tunable diversity, complemented by the autoregressive prefix- conditioning property that prevents flat-distribution outlier leakage.
The CTR lift (+5%) and add-to-carts lift (+34%) suggest the diversity is intent-aligned — wider beams aren't just retrieving random brands, they're surfacing brands that match user context that the prior CR model couldn't reach due to the vocabulary bottleneck + co-occurrence-memorisation failure modes.
Caveats¶
- Specific beam width and temperature values per surface not disclosed — the source describes the mechanism, not the production tuning.
- Beam width's compute cost grows linearly; the compute envelope the GPU stack can sustain at production beam width is not disclosed.
- The diversity-precision trade-off is not symmetric — a wider beam doesn't cost precision, but it does cost compute. The actual trade-off the post discloses is "compute spent on more candidates vs compute spent on tighter scoring of fewer candidates".
- The wins are reported on browse surfaces (retailer home page + pre-checkout) where diversity is the goal. The strict precision on search pages mode is forward-looking — not yet shipped per the post.
Seen in¶
- sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki disclosure of beam width + temperature as runtime tunable diversity dials in production generative retrieval.
Related¶
- concepts/beam-search-retrieval — the inference primitive.
- concepts/generative-retrieval — the paradigm.
- concepts/semantic-id — the vocabulary substrate.
- systems/instacart-generative-ads-retrieval — the wiki-disclosed production system.