Skip to content

PATTERN Cited by 1 source

Top-down cascaded page generation

Top-down cascaded page generation is the architectural pattern for building an entire personalized content page (discovery feed, recommendation surface, multi-section layout) in multiple LLM phases instead of one monolithic call: first generate the ordered page structure (themed sections / placements / cards), then generate the content (products, items, entities) that belongs inside each section, then filter and rank before caching for the existing serving stack.

The name contrasts with bottoms-up generation — generate all raw items across the page, then cluster them into sections. Top-down imposes structure first so every downstream step inherits page-level cohesion, personalization, and business-objective adaptability.

Shape

[user context] ──► Phase 1: Page design & theme generation
                    │   (LLM + constrained decoding → ordered themes
                    │    + derived signals: personas, freeform concepts)
                  Phase 2: Per-section content generation
                    │   (teacher-student fine-tuned LLM +
                    │    RAG candidate pruning per theme)
                  Phase 3: Quality + diversity filtering
                    │   (dedup, LLM-as-judge, cross-encoder gate,
                    │    business/policy guardrails)
                  Phase 4: Existing ranking stack (unchanged)
                  Page served to user

Phases 1-3 are the generative content pipeline; Phase 4 is the pre-existing mature ranking / serving infrastructure, consumed via a cache of Phase-3 outputs.

Canonical wiki instance — Instacart Shopping Hub (2026-02-26)

Source: sources/2026-02-26-instacart-our-early-journey-to-transform-discovery-recommendations-with-llms

Instacart's rebuild of its Shopping Hub on the generative recommendations platform is the canonical instance. The post explicitly benchmarks bottoms-up vs top-down and picks top-down on three tenets (personalization / cohesion / adaptability):

"Bottoms-up: directly generate all possible products to serve to a user, then cluster and organize them into placements. Top-down: begin by generating ordered placements to structure the entire page, then generate products per placement."

Bottoms-up's weakness is named explicitly — broader modelling task, harder to ensure generated products meet diverse per-page requirements, and costly fine-tune iteration as needs evolve. "We felt our adaptability goal would be put at risk." Top-down wins on all three tenets.

Instacart's four-phase instantiation:

  • Phase 1 — LLM page-design agent consumes user context, emits ordered themed placements ("Flavor builders for weeknight meals", "Functional hydration, lower sugar") via constrained decoding against a structured schema. Also emits derived signals — user personas + freeform product concepts ("eggs") — so Phase 2 doesn't redundantly re-derive them. Explicit token- efficiency move.
  • Phase 2 — each theme is mapped to retrieval-compatible descriptors (search queries / taxonomy categories / attribute filters). A teacher- student fine-tuned LLM (Llama / Qwen ablations + LoRA) does the mapping; RAG candidate pruning restricts the keyword-candidate set from 300,000 terms → ~100 nearest neighbours per theme via embedding similarity — 15–20% all-in cost reduction per generation.
  • Phase 3 — three-layer filter:
  • Embedding-similarity deduplication across placements.
  • LLM-as-judge on a small proportion of users for broad theme quality + brand compliance.
  • Fine-tuned DeBERTa cross-encoder classifies theme-product relevance for every placement's products (patterns/fine-tuned-cross-encoder-as-filter). >99% cheaper than LLM inference — the cost win is what lets this run as a full-catalog filter rather than a top-K reranker.
  • Business + policy guardrails (no alcoholic products for a child's birthday party; original business objectives honoured).
  • Phase 4existing product + placement ranking services retrieve Phase-3 cached outputs, rerank, post-process, return ordered entities to Shopping Hub. Unchanged.

Why the decomposition is load-bearing

The post's most reusable insight: decomposing the generation task into a cascade is a cost + quality move, not a modelling move. The all-in-one single-prompt approach Instacart started with lacks the seams that Phase-1→Phase-2 RAG-pruning, Phase-2 teacher-student distillation, and Phase-3 cross-encoder filtering can plug into:

"We ultimately found great value in decomposing generation into multiple targeted tasks. This opened the door to using retrieval- augmented generation (RAG) and other techniques that aren't feasible in a single-step model, enabling us to achieve higher quality while improving cost efficiency."

The canonical concrete: a single-step model would have to pass the full 300K-term keyword corpus as context to maintain precision; the cascade's Phase-1 freeform concepts let Phase 2 cut that corpus to ~100 candidates via embedding similarity, 15–20% per-generation cost reduction.

The same argument extends to Phase 3 — LLM-as-judge was measuring quality but couldn't take action at full-catalog scale because of per-candidate cost; the cross-encoder's >99% cost reduction lets the same quality signal become a full-catalog filter rather than a sampled measurement. Decomposition opens the door to cheap structural filters that a monolithic generator can't reach.

When the pattern fits

  • Content page with multiple sections where cross-section cohesion matters. Discovery feeds, home pages, dashboards with themed cards.
  • Personalization needs are per-user + per-context. Static content libraries and ranking-only personalisation can't keep up.
  • There's an existing mature ranking stack you want to keep. Phase 4 is designed to be the pre-existing ranking infra — "decoupling generative retrieval from mature ranking systems and providing a path to deeper pagewise control as the generative component matures."
  • Content-safety + brand-alignment guardrails are non-optional. Phase-3's guardrail layer is the natural home for these, separate from the generation model so guardrail iteration doesn't require retraining.

When it doesn't

  • Single-entity retrieval — if the page is just a top-K product list, you don't need page design.
  • No structured sections — if placements have no themes or the UI doesn't expose section structure, top-down collapses to bottoms-up.
  • Ground-truth-labelled supervised-ranking pipeline is sufficient — if your existing ranker already hits the personalization tenet, a generative-content layer is extra complexity without a return.

Failure modes

  • Phase 1 schema collapse. Constrained decoding over-constrains the themes and the output looks templated.
  • Phase 2 RAG recall hole. If the embedding space doesn't map Phase-1 freeform concepts to the keyword corpus well, the pruned candidate set is the wrong 100. Recall bounded by embedding quality.
  • Phase 3 judge drift. Cross-encoder trained on HITL-labeled data goes stale as the catalog changes; retraining cadence is an explicit platform responsibility.
  • Phase 4 ranker mis-consumption. Existing rankers were designed for human-authored placements; LLM-generated placements may have different signal distributions (more similar titles, different click priors) — legacy ranker assumptions can miss.
  • Cold-start users. Phase 1 needs user context; new users have none. Instacart's post doesn't disclose the cold-start strategy.

Relation to other wiki patterns

Seen in

Last updated · 517 distilled / 1,221 read