Skip to content

CONCEPT Cited by 1 source

Diversity via beam width and temperature

Definition

In generative retrieval systems, beam width and temperature are runtime knobs that let one model serve multiple surfaces with different precision-vs-exploration trade-offs without retraining. Wider beam = more candidate diversity. Higher temperature = more exploration in the per-step token distribution.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"Unlike scoring models, the generative approach unlocks direct tuning mechanisms through beam width and temperature sampling. These serve as precise levers to balance intent and exploration — allowing us to dial up strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces."

Why this is a structural advantage

A scoring retrieval model has one knob: top-K threshold. Two fundamental limitations follow:

  1. Top-K cannot encode exploration. It is a precision-truncation knob, not an exploration knob. Scoring more candidates surfaces more of the same items, not different categories of items.
  2. Per-surface tuning requires per-surface models. If search needs precision and post-checkout needs exploration, scoring retrieval requires two trained models (or at minimum two re-ranking heads).

Generative retrieval, by exposing two orthogonal runtime knobs — beam width (output-side) and temperature (per-step distribution shaping) — lets the same model serve surfaces with different intent profiles:

Surface Intent profile Beam width Temperature
Search page Strict precision (user knows what they want) Narrow Low
Retailer home page Broad discovery Wider Moderate
Pre-checkout Cart-completion + brand-exploration Wider Higher
Post-checkout Maximum brand diversity / exploration Widest Highest

(Specific values per surface are not disclosed in the source.)

How the two knobs differ

Beam width is a structural knob — it changes how many parallel paths the decoder explores at each step. Wider beam:

  • Produces more distinct full SID sequences per request.
  • Each sequence is still locally optimal at each step.
  • Increases compute cost roughly linearly.

Temperature is a distribution-shaping knob — it changes the softmax temperature applied to the per-step logits. Higher temperature:

  • Flattens the distribution at each step, so lower-probability options become more likely to be selected into the beam.
  • Doesn't change the number of sequences explored, but does change which sequences make it into the beam.
  • Adds essentially no compute cost.

Composing them: low temperature + wide beam = many paths through the high-probability region (deep precision-focused exploration). High temperature + narrow beam = few paths through a broader region of the distribution (random walk-style exploration). High temperature + wide beam = both, maximum diversity.

Operational outcome at Instacart

The 2026-06 source attributes the 2.7× more brands and 1.8× more sub-categories retrieval-diversity wins (and the +421% Alcohol / +396% Beverages / +229% Healthcare category-conditional wins) to the structural property that generative retrieval allows direct tunable diversity, complemented by the autoregressive prefix- conditioning property that prevents flat-distribution outlier leakage.

The CTR lift (+5%) and add-to-carts lift (+34%) suggest the diversity is intent-aligned — wider beams aren't just retrieving random brands, they're surfacing brands that match user context that the prior CR model couldn't reach due to the vocabulary bottleneck + co-occurrence-memorisation failure modes.

Caveats

  • Specific beam width and temperature values per surface not disclosed — the source describes the mechanism, not the production tuning.
  • Beam width's compute cost grows linearly; the compute envelope the GPU stack can sustain at production beam width is not disclosed.
  • The diversity-precision trade-off is not symmetric — a wider beam doesn't cost precision, but it does cost compute. The actual trade-off the post discloses is "compute spent on more candidates vs compute spent on tighter scoring of fewer candidates".
  • The wins are reported on browse surfaces (retailer home page + pre-checkout) where diversity is the goal. The strict precision on search pages mode is forward-looking — not yet shipped per the post.

Seen in

Last updated · 542 distilled / 1,571 read