CONCEPT Cited by 1 source

Diversity via beam width and temperature¶

Definition¶

In generative retrieval systems, beam width and temperature are runtime knobs that let one model serve multiple surfaces with different precision-vs-exploration trade-offs without retraining. Wider beam = more candidate diversity. Higher temperature = more exploration in the per-step token distribution.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"Unlike scoring models, the generative approach unlocks direct tuning mechanisms through beam width and temperature sampling. These serve as precise levers to balance intent and exploration — allowing us to dial up strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces."

Why this is a structural advantage¶

A scoring retrieval model has one knob: top-K threshold. Two fundamental limitations follow:

Top-K cannot encode exploration. It is a precision-truncation knob, not an exploration knob. Scoring more candidates surfaces more of the same items, not different categories of items.
Per-surface tuning requires per-surface models. If search needs precision and post-checkout needs exploration, scoring retrieval requires two trained models (or at minimum two re-ranking heads).

Generative retrieval, by exposing two orthogonal runtime knobs — beam width (output-side) and temperature (per-step distribution shaping) — lets the same model serve surfaces with different intent profiles:

Surface	Intent profile	Beam width	Temperature
Search page	Strict precision (user knows what they want)	Narrow	Low
Retailer home page	Broad discovery	Wider	Moderate
Pre-checkout	Cart-completion + brand-exploration	Wider	Higher
Post-checkout	Maximum brand diversity / exploration	Widest	Highest

(Specific values per surface are not disclosed in the source.)

How the two knobs differ¶

Beam width is a structural knob — it changes how many parallel paths the decoder explores at each step. Wider beam:

Produces more distinct full SID sequences per request.
Each sequence is still locally optimal at each step.
Increases compute cost roughly linearly.

Temperature is a distribution-shaping knob — it changes the softmax temperature applied to the per-step logits. Higher temperature:

Flattens the distribution at each step, so lower-probability options become more likely to be selected into the beam.
Doesn't change the number of sequences explored, but does change which sequences make it into the beam.
Adds essentially no compute cost.

Composing them: low temperature + wide beam = many paths through the high-probability region (deep precision-focused exploration). High temperature + narrow beam = few paths through a broader region of the distribution (random walk-style exploration). High temperature + wide beam = both, maximum diversity.

Operational outcome at Instacart¶

The 2026-06 source attributes the 2.7× more brands and 1.8× more sub-categories retrieval-diversity wins (and the +421% Alcohol / +396% Beverages / +229% Healthcare category-conditional wins) to the structural property that generative retrieval allows direct tunable diversity, complemented by the autoregressive prefix- conditioning property that prevents flat-distribution outlier leakage.

The CTR lift (+5%) and add-to-carts lift (+34%) suggest the diversity is intent-aligned — wider beams aren't just retrieving random brands, they're surfacing brands that match user context that the prior CR model couldn't reach due to the vocabulary bottleneck + co-occurrence-memorisation failure modes.

Caveats¶

Specific beam width and temperature values per surface not disclosed — the source describes the mechanism, not the production tuning.
Beam width's compute cost grows linearly; the compute envelope the GPU stack can sustain at production beam width is not disclosed.
The diversity-precision trade-off is not symmetric — a wider beam doesn't cost precision, but it does cost compute. The actual trade-off the post discloses is "compute spent on more candidates vs compute spent on tighter scoring of fewer candidates".
The wins are reported on browse surfaces (retailer home page + pre-checkout) where diversity is the goal. The strict precision on search pages mode is forward-looking — not yet shipped per the post.

Seen in¶

sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki disclosure of beam width + temperature as runtime tunable diversity dials in production generative retrieval.

concepts/beam-search-retrieval — the inference primitive.
concepts/generative-retrieval — the paradigm.
concepts/semantic-id — the vocabulary substrate.
systems/instacart-generative-ads-retrieval — the wiki-disclosed production system.