Skip to content

CONCEPT Cited by 1 source

Generative recommendations

Definition

Generative recommendations is the class of recommendation- system architectures that use generative AI models (typically LLMs) to produce the content of a recommendation surface — titles, themes, section structure, item descriptions, or the items themselves — rather than only ranking a pre-authored catalog of human-defined recommendations.

Classical recommendation systems match users to items from a pre-existing item pool. Generative recommendations extend the surface with an LLM-produced layer: themes, placements, section titles, per-user explanations — artefacts that didn't exist in the catalog until the model generated them.

Two generation paradigms

The 2026-02-26 Instacart post frames the design choice as a pair:

  • Bottoms-up generation — LLM generates all relevant items, then a clustering step groups them into themed sections.
  • Top-down generation — LLM generates ordered themed sections first, then generates items per section.

Instacart's post names the adaptability + cohesion tradeoff as the decisive factor and picks top-down.

Why it matters

Three forces, named in the 2026-02-26 post:

  1. Scaling personalized content. Human-authored content libraries can't keep up with per-user + seasonal + business- objective variance.
  2. Cross-section cohesion. Human-authored sections are created by siloed teams; LLMs can reason over the whole page at once and enforce cross- section narrative.
  3. Adaptability. Shifting business objectives (relevance vs novelty), seasonal placements, new user segments — the LLM generates on demand rather than waiting for content ops to author.

Architectural decomposition

Production generative-recommendation systems typically compose four responsibilities (per Instacart's four-phase Shopping Hub rebuild):

  1. Section / theme generation — user context → ordered themed sections.
  2. Item / keyword generation — themes → retrieval- compatible descriptors → items.
  3. Quality + guardrail filtering — LLM-as-judge, cross- encoder classification, policy guardrails.
  4. Ranking — existing mature ranking infrastructure, unchanged.

Phases 1-3 are the generative content pipeline (the new layer); Phase 4 is the pre-existing ranking stack. See patterns/top-down-cascaded-page-generation for the canonical production shape.

Canonical wiki instance — Instacart Shopping Hub (2026-02-26)

Source: sources/2026-02-26-instacart-our-early-journey-to-transform-discovery-recommendations-with-llms

Instacart's rebuild of Shopping Hub on the generative recommendations platform is the canonical wiki instance:

  • Generated artefact: themed placements ("Flavor builders for weeknight meals", "Functional hydration, lower sugar")
  • products per placement.
  • Three tenets named: delightful personalization, cross- placement cohesion, adaptability.
  • Cascaded shape: four phases; Phase 4 is unchanged ranking infra.
  • Cost tech: RAG candidate pruning + teacher-student fine- tune + fine-tuned cross-encoder quality gate. See patterns/rag-candidate-pruning-cascade and patterns/fine-tuned-cross-encoder-as-filter.
  • Evaluation: three-prong — multi-level LLM-as-judge + fine- tuned DeBERTa + classical ML metrics. See patterns/llm-as-judge-multi-level-rubric.

Relation to other recommendation architectures

  • vs collaborative filtering — CF matches users to items from item embeddings + user interaction history. No generation.
  • vs learning-to-rank — LTR ranks a retrieved candidate pool. No generation.
  • vs LLM-for-query-understanding — the Intent Engine uses LLMs to understand the query side (user typed "snacks for kids" → tagged query) but doesn't generate recommendation content. Complementary: Intent Engine is supply-side (ranking-time query tagging), generative recommendations is demand-side (no-query discovery content).
  • vs RAG — RAG generates answers grounded in retrieved documents; generative recommendations generate recommendation surface content grounded in user + catalog context. Structurally similar (LLM with retrieved context); different outputs.

Open questions / caveats

  • Quality at scale. LLM-as-judge can measure quality but not act at full-catalog scale; the fine-tuned cross-encoder is a specific answer but has its own drift + calibration costs.
  • Safety + brand alignment. Generative content can hallucinate harmful pairings (Instacart's named example: "alcoholic products for a child's birthday party"). Guardrails live at the filter layer.
  • Cold-start. LLM page design needs user context; new users have none. Cold-start strategy is not publicly disclosed by Instacart.
  • Production A/B outcomes. Instacart's 2026-02-26 post is framed as an "early journey" — architecture disclosed, production uplift numbers not yet shipped.

Seen in

Last updated · 517 distilled / 1,221 read