SYSTEM Cited by 1 source
Instacart Contextual Recommendations (CR)¶
Definition¶
Contextual Recommendations (CR) was Instacart's BERT-based retrieval-stage candidate generator for both ads and organic recommendations across all major browse surfaces, in production from ~2024–2026. CR was a scoring model: at inference time it produced a probability distribution over the entire vocabulary of atomic product IDs in its trained vocabulary and returned the top-K. It is deprecated as of 2026-06 for ads retrieval on retailer home page + pre-checkout, replaced by systems/instacart-generative-ads-retrieval which generates Semantic IDs token-by-token instead.
Quote (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):
"Two years ago, we introduced Contextual Recommendations (CR), a BERT-based sequence model powering retrieval for both ads and organic recommendations across all major browse surfaces."
Architecture¶
"At its core, CR treats grocery shopping as a language modeling task, where atomic product IDs function as tokens and, the finite subset of the catalog it is trained on, acts as its 'vocabulary'. The model leverages the user's real-time session, which includes product views, item page visits, and cart additions, as a sequence of these product tokens. A BERT-like transformer is then trained on millions of authentic shopping sessions to predict the next token (i.e. singular product) in the sequence."
shopping session: [view A, view B, cart-add C, view D, ...]
│
▼
[BERT-like Transformer encoder]
│
▼
probability distribution over full atomic-product-ID vocabulary
│
▼
top-K → CG output
The model treats grocery shopping as a language-modelling task: each user session is a sequence of atomic product IDs, the model predicts the next product in the sequence, and the top-K predicted products form the candidate set. "This single retrieval layer replaced multiple ad-hoc systems and powers recommendation carousels across all major browse surfaces, serving both ads and organic content."
What worked¶
The CR architecture was a meaningful improvement over the "multiple ad-hoc systems" that preceded it. Two iterations on top of the base shape produced "meaningful gains in add-to-carts and ads coverage, particularly for specialty retailers and short shopping sessions":
- Vocabulary expansion — increasing the count of atomic product IDs CR was trained on to expand retrieval coverage.
- Richer context — adding retailer awareness and long-term user personalisation features.
Three structural ceilings — why CR was rebuilt¶
CR's scoring architecture (predict a probability distribution over the full vocabulary at every step) hit three ceilings that vocabulary expansion alone could not break (Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart):
1. Vocabulary bottleneck¶
"The CR model relies on atomic product IDs as distinct tokens, which establishes the boundaries of what the model can interpret and predict. While expanding this vocabulary enhances the model's ability to understand the detailed context of a user's session, it simultaneously increases model size and latency while creating data sparsity for less common items. Additionally this catalog is non-stationary. As new products are added to the catalog, the coverage gap keeps expanding."
CR's vocabulary was bounded by training time; new products were only retrievable after the next training cycle, and even then only if they had enough sessions to escape data sparsity. The catalog grew faster than the vocabulary could expand. See concepts/vocabulary-bottleneck.
2. Cold-start hurdle¶
"To train this model, the historical shopping sessions were designed as sequences of atomic product IDs. This occasionally caused it to memorize co-occurrences instead of learning generalized associations based on the user's intent. This resulted in the model favoring high-frequency items over newer products which are more aligned with the user's context."
Concrete failure mode given: a user building a summer-barbecue cart (ground beef, hamburger buns, lettuce) would receive a generic staple (milk) ahead of an emerging brand's contextually-aligned mustard. The model had memorised that milk often follows ground beef without internalising what kind of cart the user is building. See concepts/cold-start (recsys cold-start axis).
3. Structural drift¶
"The final candidate set from the model is generated by predicting a probability distribution across the entire vocabulary of product IDs. Without a built-in hierarchy to keep the recommendations focused, the model occasionally retrieves a disjointed mix of items. For example, a breakfast-themed cart [e.g., milk, eggs, cereal] may lead to laundry detergent being retrieved along with other valid recommendations [e.g., bread, muffins]. If the subsequent ranking model was miscalibrated on these outlier products, these incoherent recommendations from the candidate set would eventually get bubbled up to the user next to a perfectly good set of recommendations."
Flat probability distributions across a heterogeneous vocabulary do not enforce category coherence — the architecture has no mechanism to say "if the user is shopping breakfast, only generate breakfast- adjacent products." The successor's generative retrieval paradigm fixes this with autoregressive prefix conditioning.
Why CR was a meaningful precursor¶
The post is careful to position CR not as a failed system but as the load-bearing precursor that made the generative successor possible. CR established three things the successor inherits:
- Sequence-as-shopping-trajectory framing — the language-modelling-as-recsys analogy is unchanged in the successor (the change is what the tokens are, not what the sequences represent).
- Single-model multi-surface serving — CR replacing "multiple ad-hoc systems" established the pattern; the successor inherits the same one-model-across-surfaces shape with surface-specific beam-width / temperature dials replacing surface-specific models.
- Real-time session context as input — CR already consumed views / item-page-visits / cart-additions as real-time signal; the successor's prompt template (patterns/context-template-prompt-with-special-tokens) is a richer continuation, not a different paradigm.
Deprecation scope¶
The 2026-06 deprecation is partial, not total:
- Deprecated for ads retrieval on retailer home page + pre-checkout — replaced by systems/instacart-generative-ads-retrieval.
- CR's role on other surfaces (search, post-checkout, organic recommendations) is not addressed in the post. Reasonable inference: CR continues to serve those surfaces until the generative paradigm is extended or the post-CR successor is identified for them.
Sibling architectures elsewhere on the wiki¶
- Pinterest Contextual Sequential CG — Pinterest ads CG with the same family shape (Transformer-based two-tower / sequence model with real-time context layer); both Pinterest's CG and Instacart's CR are scoring-side sequence-model CGs with real-time context.
- systems/pinterest-shopping-conversion-cg — another scoring- side sibling.
- Meta SilverTorch — orthogonal evolution of scoring-side recsys retrieval (absorbs the ANN index into the model graph as a tensor); a different way to not-score-every-item than Instacart's full pivot to generation.
Caveats¶
- Specific pre-CR system list ("multiple ad-hoc systems") not enumerated.
- Vocabulary size at deprecation not disclosed.
- Retraining cadence and feature engineering details deferred to the prior CR write-up linked from the source post (Sequence Models for Contextual Recommendations at Instacart, not yet ingested).
- Deprecation timeline for non-ads / non-browse surfaces not disclosed.
Seen in¶
- sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — canonicalises CR as the prior architecture and the three structural ceilings that motivated the generative successor.
Related¶
- systems/instacart-generative-ads-retrieval — the successor on the ads-retrieval surface.
- systems/instacart-semantic-ids — the new vocabulary substrate that replaces atomic product IDs.
- systems/instacart-carrot-ads — the broader ads platform CR served as the candidate-generation stage of.
- systems/transformer — the BERT-family architecture CR was built on.
- concepts/vocabulary-bottleneck / concepts/cold-start / concepts/atomic-product-id-vs-semantic-id — the structural ceilings.
- concepts/two-tower-architecture / concepts/retrieval-ranking-funnel — the broader retrieval design space CR sat in.