Skip to content

PATTERN Cited by 1 source

Beam search with retailer-partitioned mapping

Pattern

Compose beam search over a generative-retrieval decoder with a retailer-partitioned index that maps generated Semantic IDs back to available, attributed candidate items for the current retailer's catalog. The combination is the canonical inference shape for multi-tenant generative recsys retrieval.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"During serving, we build the candidate set via beam search. As illustrated in the diagram below, the decoder reads this input and generates recommendations token by token. At each step, beam search explores multiple promising paths for the next codeword. This ultimately yields several distinct, fully formed SID sequences. Finally, these generated sequences are mapped against a retailer-partitioned index to retrieve a diverse variety of relevant, available ad products."

Three structural pieces

1. Beam search over the codeword positions

Generate K-codeword Semantic IDs via beam search: - At each codeword position, maintain top-B (beam width) most likely partial sequences. - Continue for K decode steps (one per codeword position). - Output: B distinct full SID sequences per request.

The model itself is catalog-agnostic — it doesn't know which retailer's catalog the request is for. It generates SIDs from the global codebook.

2. Retailer-partitioned index lookup

For each generated SID s_i and the current request's retailer_id:

key:    (retailer_id, s_i)
value:  list of (product_id, availability, ad_metadata)

The lookup returns only products that exist at the current retailer, are in stock, are ad-eligible, and are correctly attributed for the current retailer's ad business.

3. Filtered candidate set → ranker

Concatenate the lookup results across all generated SIDs into a single candidate set; deduplicate; pass to the downstream ranker (at Instacart, the Carrot Ads pCTR model).

Why this composition is the right shape

Three reasons the post-decode mapping needs to be retailer- partitioned (not catalog-global):

1. Multi-retailer marketplace requires per-retailer scope

Instacart hosts many retailers' catalogs. The generative retriever produces SIDs over the global codebook; only products available at the current retailer should be candidates. Without the partition, the candidate set would include products that don't exist at this retailer.

2. Product availability is per-(retailer, product)

A product's SID is global (codebook-derived); whether the product is in stock, purchasable, or advertised at a given retailer is per-(retailer, product). The mapping layer is where this freshness cross-cuts retrieval.

3. Ad-attribution is per-retailer

Carrot Ads runs both Instacart's own ads and per-retailer ad businesses. The partition ensures the correct attribution ledger is hit for the current request — preventing cross-retailer ad crosstalk.

Why beam search complements the partition

A naive alternative — "generate SIDs globally then filter to the retailer's catalog" — wastes beam-search effort on out-of-catalog SIDs. Three failure modes:

  • Beam diversity dilutes — if a beam mostly contains SIDs not in the retailer's catalog, the effective candidate count is smaller than the beam width suggests.
  • Wasted compute — the autoregressive decoder runs the same cost regardless of whether the resulting SIDs map to anything.
  • Inconsistent latency budgets — some requests waste 90% of the beam on out-of-catalog SIDs; others find catalog-resident matches early.

The retailer-partitioned index (not just filter) lets the mapping layer reason about retailer-specific catalog at lookup time without re-running the model. The model stays catalog-agnostic; the lookup absorbs the partition.

Cost geometry

Cost component Scaling
Beam search inference O(beam_width × decode_steps × decoder_compute_per_step) — independent of catalog
Index lookup O(beam_width × avg_products_per_(retailer, SID))
Per-request total dominated by inference

Tunable knobs

  • Beam width — higher = more diverse SIDs surfaced.
  • Temperature — higher = more exploration in per-step distribution.
  • Per-(retailer, SID) lookup cap — if a single SID maps to many products at a retailer, cap the per-SID returns to bound candidate-set size.

See concepts/diversity-via-beam-and-temperature.

Caveats

  • Specific beam widths used by Instacart not disclosed.
  • Index-lookup latency relative to inference latency not disclosed (probably small but unverified).
  • Cross-retailer SID popularity skew (whether SIDs popular at one retailer are dominant in another's beam) not addressed.
  • Update cadence of the index (when product availability changes across retailers in real time) not disclosed.
  • The post calls the index "highly efficient" without benchmarking — the lookup cost is bundled into the −10–17% latency reduction headline.
  • The pattern requires the consuming retriever to be multi-tenant- aware at training time (it must learn that "retailer type" affects which SIDs to generate); see patterns/context-template-prompt-with-special-tokens for the prompt-template piece that carries this context.

Sibling patterns

Seen in

Last updated · 542 distilled / 1,571 read