SYSTEM Cited by 1 source

Instacart Generative Ads Retrieval¶

Definition¶

Instacart Generative Ads Retrieval is the candidate-generation (retrieval) stage of Instacart's ads platform on browse surfaces (retailer home page + pre-checkout). It is an autoregressive Transformer decoder that generates the next recommended item token-by-token as a sequence of Semantic IDs (SIDs) via beam search, replacing the prior BERT-based scoring model (CR) that predicted a probability distribution over the entire atomic-product-ID vocabulary.

Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

"We rebuilt the system, by moving from an encoder that scores products to a generative model that spells them out, token by token."

The architecture is inspired by TIGER (Google DeepMind, NeurIPS 2023) and adopted in the same generative-paradigm wave as Spotify GLIDE/NEO and YouTube PLUM.

Where it sits in the stack¶

                 retailer home page / pre-checkout request
                                  │
                                  ▼
              ┌──────────────────────────────────────────┐
              │       Candidate Generator (CG)           │
              │  ┌────────────────────────────────────┐  │
              │  │  Input Translation                 │  │
              │  │  → context template prompt:        │  │
              │  │    [retailer-token]                │  │
              │  │    [user-history-SID-1...N]        │  │
              │  │    [cart-SID-1...M]                │  │
              │  └──────────────┬─────────────────────┘  │
              │                 ▼                         │
              │  ┌────────────────────────────────────┐  │
              │  │  GPU Model Inference               │  │
              │  │  → autoregressive decoder           │  │
              │  │  → beam search over codeword steps │  │
              │  │  → K distinct full SID sequences    │  │
              │  └──────────────┬─────────────────────┘  │
              │                 ▼                         │
              │  ┌────────────────────────────────────┐  │
              │  │  Product Mapping & Indexing        │  │
              │  │  → retailer-partitioned index       │  │
              │  │  → SIDs → available, attributed ads │  │
              │  └──────────────┬─────────────────────┘  │
              └─────────────────┼─────────────────────────┘
                                ▼
                        downstream Carrot Ads ranker
                                ▼
                            user impression

The three serving operations¶

Per the Source page:

Input Translation — "Features are dynamically fetched and collated to create the input prompt." The prompt template (patterns/context-template-prompt-with-special-tokens) is assembled from retailer-type token + top-N user-history SIDs + cart SIDs, with special tokens delimiting segments.
GPU Model Inference — "The model runs inference and generates relevant SID sequences." Autoregressive decoder + beam search over codeword positions; produces K distinct fully-formed SID sequences per request.
Product Mapping and Indexing — "The generated SIDs are mapped back to active ad products via a specialized, highly efficient retailer-partitioned index, ensuring that only relevant, available, and correctly attributed ads are retrieved."

What changes between training and serving¶

Training is conventional next-token prediction: "During training, the model reads this template and learns to autoregressively generate the SID of the next item the user adds to their cart." The training objective is the SID of the actual next item the user added; the model learns to generate that SID conditioned on the prompt prefix.

Serving uses beam search rather than greedy decoding: "At each step, beam search explores multiple promising paths for the next codeword. This ultimately yields several distinct, fully formed SID sequences." Beam width and temperature are exposed as runtime knobs (concepts/diversity-via-beam-and-temperature) so the same model can be tuned for different surfaces — "strict precision on search pages, while turning up brand diversity and discovery on post-checkout surfaces."

How it dissolves three structural ceilings of the prior CR¶

The prior CR model hit three structural ceilings each addressed by a property of the generative paradigm (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):

Ceiling on CR	Why generative dissolves it
Vocabulary bottleneck — model size and latency grow with the catalog; data sparsity for tail items; non-stationary catalog widens coverage gap	Fixed codebook size; "the model constructs the semantic representation of the next item on the fly, avoiding the memory and latency penalties that previously restricted our catalog coverage."
Cold-start hurdle — co-occurrence memorisation favours high-frequency items over intent-aligned newer products	SIDs share prefixes for semantically similar products; "a new product entering the catalog is added to one of the existing SIDs and is visible to the model from day one."
Structural drift — flat probability distribution over atomic IDs occasionally retrieves disjointed mix (laundry detergent in a breakfast cart)	Autoregressive prefix conditioning; "each codeword is explicitly conditioned on the previous one. This enforces a strict hierarchy during retrieval."

Serving substrate¶

Per the Source: "As autoregressive decoding with beam search is fairly compute intensive, it was not viable to serve this model the legacy serving stack that relied on Python and CPU inference. To unblock this model serving, the team developed a brand new GPU serving stack."

Inference engine: TensorRT-LLM — NVIDIA's high-performance LLM inference compiler.
Serving runtime: NVIDIA Triton Inference Server.
Service shell: Go-native service — "delivers higher throughput and lower latency compared to the legacy Python environment."
ML platform integration: fully integrated with Griffin 2.0, Instacart's ML serving platform.

The serving-substrate change — from Python+CPU to Go+GPU — is what made the order-of-magnitude-larger compute budget of autoregressive decoding economically viable; the post explicitly frames this as a prerequisite, not an afterthought.

Operational outcomes¶

Metric	Value
Candidate volume	~2× more candidates per request
Mean retrieval latency	−10–17% (despite 2× volume)
Click-through rate	+5%
Add-to-carts	+34% (post calls "step-function increase")
Brand diversity in recommendations	2.7× more brands
Sub-category diversity	1.8× more sub-categories
Alcohol category diversity	+421%
Beverages category diversity	+396%
Healthcare category diversity	+229%

Surfaces launched on¶

Two browse surfaces explicitly named: - Retailer home page — the start of a shopping session - Pre-checkout phase — just before the order is finalized

Per the Source: "these are contexts where users are browsing rather than searching, and candidate diversity & contextual relevance matter more than surgical precision." The retailer home page maximises discovery; pre-checkout maximises basket-completion / brand-diversity exposure on the way to purchase.

Search and post-checkout surfaces are explicitly named as future candidates for the same model with different beam-width / temperature settings.

Composition with the rest of the Instacart ads stack¶

This system is the candidate generator in the Carrot Ads stack; the generated candidates feed the Carrot Ads pCTR ranker which scores them against the real-time auction. The pCTR ranker is unchanged by this work — the post is exclusively about retrieval — which means brand-diversity / cold-start gains have to compose with the existing pCTR scoring without ranker miscalibration.

Caveats¶

Codebook size, beam width, temperature settings not disclosed.
p99/p99.9 latency vs CR not disclosed.
GPU SKU / cluster topology / cost not disclosed.
Surfaces remain limited to two; search and post-checkout deferred.
Ranker-side (pCTR) changes not addressed.
"If the subsequent ranking model was miscalibrated on these outlier products, these incoherent recommendations from the candidate set would eventually get bubbled up to the user" — acknowledges ranker-CG calibration risk but no mitigation reported.

Seen in¶

sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — the canonical disclosure of this system.

systems/instacart-semantic-ids — the vocabulary substrate this model decodes into.
systems/instacart-contextual-recommendations — the prior model this replaces.
systems/instacart-carrot-ads / systems/instacart-carrot-ads-pctr-model — the downstream ranker stage.
systems/tiger-generative-retrieval — the architectural reference paper.
systems/silvertorch — Meta's concepts/index-as-model retrieval-paradigm sibling: a different shape of not-scoring-every- item (in-graph index) vs Instacart's generative approach.
systems/pinterest-contextual-sequential-cg — Pinterest's scoring-side sequence-model CG; same family as the prior CR.
concepts/generative-retrieval / concepts/beam-search-retrieval / concepts/retailer-partitioned-index — canonical concepts.