PATTERN Cited by 1 source

Beam search with retailer-partitioned mapping¶

Pattern¶

Compose beam search over a generative-retrieval decoder with a retailer-partitioned index that maps generated Semantic IDs back to available, attributed candidate items for the current retailer's catalog. The combination is the canonical inference shape for multi-tenant generative recsys retrieval.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"During serving, we build the candidate set via beam search. As illustrated in the diagram below, the decoder reads this input and generates recommendations token by token. At each step, beam search explores multiple promising paths for the next codeword. This ultimately yields several distinct, fully formed SID sequences. Finally, these generated sequences are mapped against a retailer-partitioned index to retrieve a diverse variety of relevant, available ad products."

Three structural pieces¶

1. Beam search over the codeword positions¶

Generate K-codeword Semantic IDs via beam search: - At each codeword position, maintain top-B (beam width) most likely partial sequences. - Continue for K decode steps (one per codeword position). - Output: B distinct full SID sequences per request.

The model itself is catalog-agnostic — it doesn't know which retailer's catalog the request is for. It generates SIDs from the global codebook.

2. Retailer-partitioned index lookup¶

For each generated SID s_i and the current request's retailer_id:

key:    (retailer_id, s_i)
value:  list of (product_id, availability, ad_metadata)

The lookup returns only products that exist at the current retailer, are in stock, are ad-eligible, and are correctly attributed for the current retailer's ad business.

3. Filtered candidate set → ranker¶

Concatenate the lookup results across all generated SIDs into a single candidate set; deduplicate; pass to the downstream ranker (at Instacart, the Carrot Ads pCTR model).

Why this composition is the right shape¶

Three reasons the post-decode mapping needs to be retailer- partitioned (not catalog-global):

1. Multi-retailer marketplace requires per-retailer scope¶

Instacart hosts many retailers' catalogs. The generative retriever produces SIDs over the global codebook; only products available at the current retailer should be candidates. Without the partition, the candidate set would include products that don't exist at this retailer.

2. Product availability is per-(retailer, product)¶

A product's SID is global (codebook-derived); whether the product is in stock, purchasable, or advertised at a given retailer is per-(retailer, product). The mapping layer is where this freshness cross-cuts retrieval.

3. Ad-attribution is per-retailer¶

Carrot Ads runs both Instacart's own ads and per-retailer ad businesses. The partition ensures the correct attribution ledger is hit for the current request — preventing cross-retailer ad crosstalk.

Why beam search complements the partition¶

A naive alternative — "generate SIDs globally then filter to the retailer's catalog" — wastes beam-search effort on out-of-catalog SIDs. Three failure modes:

Beam diversity dilutes — if a beam mostly contains SIDs not in the retailer's catalog, the effective candidate count is smaller than the beam width suggests.
Wasted compute — the autoregressive decoder runs the same cost regardless of whether the resulting SIDs map to anything.
Inconsistent latency budgets — some requests waste 90% of the beam on out-of-catalog SIDs; others find catalog-resident matches early.

The retailer-partitioned index (not just filter) lets the mapping layer reason about retailer-specific catalog at lookup time without re-running the model. The model stays catalog-agnostic; the lookup absorbs the partition.

Cost geometry¶

Cost component	Scaling
Beam search inference	O(beam_width × decode_steps × decoder_compute_per_step) — independent of catalog
Index lookup	O(beam_width × avg_products_per_(retailer, SID))
Per-request total	dominated by inference

Tunable knobs¶

Beam width — higher = more diverse SIDs surfaced.
Temperature — higher = more exploration in per-step distribution.
Per-(retailer, SID) lookup cap — if a single SID maps to many products at a retailer, cap the per-SID returns to bound candidate-set size.

See concepts/diversity-via-beam-and-temperature.

Caveats¶

Specific beam widths used by Instacart not disclosed.
Index-lookup latency relative to inference latency not disclosed (probably small but unverified).
Cross-retailer SID popularity skew (whether SIDs popular at one retailer are dominant in another's beam) not addressed.
Update cadence of the index (when product availability changes across retailers in real time) not disclosed.
The post calls the index "highly efficient" without benchmarking — the lookup cost is bundled into the −10–17% latency reduction headline.
The pattern requires the consuming retriever to be multi-tenant- aware at training time (it must learn that "retailer type" affects which SIDs to generate); see patterns/context-template-prompt-with-special-tokens for the prompt-template piece that carries this context.

Sibling patterns¶

patterns/per-tenant-workspace-in-third-party-saas — the per-tenant-isolation pattern at the data-substrate altitude.
concepts/document-sharding (Meta SilverTorch) — the per-shard-routing pattern at the GPU-parallelism altitude.

Seen in¶

sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki disclosure of beam-search + retailer- partitioned mapping as the inference shape for production generative ads retrieval.

concepts/beam-search-retrieval / concepts/generative-retrieval / concepts/retailer-partitioned-index / concepts/semantic-id — supporting concepts.
systems/instacart-generative-ads-retrieval / systems/instacart-marketplace / systems/instacart-carrot-ads — production context.
patterns/generative-over-scoring-retrieval / patterns/context-template-prompt-with-special-tokens — composing patterns.