Skip to content

CONCEPT Cited by 3 sources

Retrieval → ranking funnel

Definition

The retrieval → ranking funnel is the canonical two-stage architecture for recommendation, search, and recommendation-like systems at scale:

  1. Retrieval (stage 1). A cheap, high-recall primitive narrows an intractably large candidate population (millions to billions of items) to a rank-tractable set — typically 10² to 10⁴ candidates.
  2. Ranking (stage 2). A more expensive model — cross-encoder, LLM, or a large MTML network — scores or orders the narrowed set and produces a ranked short-list (top-K) to present to the user.

The asymmetric cost structure — retriever runs on every request against a huge pool, ranker runs only over a small narrowed set — is what makes the overall system affordable at production request volume.

Structural properties

  • Retriever recall is the ceiling on end-to-end accuracy. If the correct / best item doesn't survive retrieval, no amount of ranking quality can recover it. The Meta Friend Bubbles post (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) states this directly: "By explicitly retrieving friend-interacted content, we expand the top of the funnel to ensure sufficient candidate volume for downstream ranking stages. This is important because, without it, high-quality friend content may never enter the ranking pipeline in the first place."
  • Ranker precision is the ceiling on how cleanly the top-K isolates the right answer. The two ceilings compose multiplicatively — both must meet their bars independently.
  • Expanding top-of-funnel is a dial. When a new candidate class (friend-interacted Reels, a new content vertical, a new index) is missing from the ranker's output, the fix is often at retrieval, not ranking.

The retriever choice space

  • Heuristic retrieval. Domain rules — ownership, social graph + closeness threshold, code-graph traversal, time windows. Fast, interpretable, limited to encoded knowledge. Used by Meta RCA (ownership + code graph) and Meta Friend Bubbles (close-friend candidate sourcing via viewer-friend closeness).
  • Lexical (BM25). Term-frequency scoring. Fast, interpretable, limited to surface keyword match. Canonical for text search.
  • Vector + ANN. Learned embeddings + approximate nearest-neighbour search. Handles semantic similarity; needs embedding infra.
  • Hybrid. Combined lexical + vector. Industry default for document search.
  • Two-tower / multi-tower recall models. Purpose-trained retrieval models for recommendation — typical at Meta / Google / YouTube / TikTok scale. Not named by this Meta post but the standard family for "friend-interacted content retrieval."

Choice is domain-driven. Monorepo RCA has structured ownership + code graph (heuristics win); open-domain document search benefits from hybrid; Reels-scale recommendation typically combines heuristic closeness-based retrieval with embedding-based video-similarity recall.

The ranker choice space

  • MTML models. Multi-task multi-label deep networks with shared encoders + task-specific heads, optimising many engagement targets jointly. The industry default for large-scale recommendation ranking (Meta, Google, TikTok). Meta Friend Bubbles: early-stage + late-stage MTML models with new bubble-conditioned tasks.
  • LLM ranker. LLM scores / orders the narrowed set in natural language. Meta RCA: fine-tuned Llama-2 (7B) running ranking-via-election.
  • Cross-encoder. Smaller Transformer scoring (query, candidate). Cheaper than LLM; less reasoning capacity. Canonical for document search reranking.
  • Pointwise classifier. Small domain-trained model outputs a score per (query, candidate). Cheapest; weakest.

Canonical wiki instances

  • Meta Friend Bubbles (2026-03-18) — recommendation instance. Heuristic closeness-based retrieval + MTML ranking with conditional-probability bubble objective. Canonical datum for expanding top of funnel as the fix for missing candidate class.
  • Meta RCA (2024-06) — RCA / LLM instance. Heuristic ownership + code-graph retrieval + Llama-2 7B ranker via ranking-via-election. See patterns/retrieve-then-rank-llm for the LLM-specific pattern.

Relation to other framings

  • patterns/retrieve-then-rank-llm is the LLM-specific pattern instance of this funnel concept — when the stage-2 ranker is specifically an LLM.
  • concepts/llm-cascade is the sibling cascade pattern at model-size level (small LLM → large LLM), orthogonal to the stage level (retriever → ranker) described here. Both are cascades; they compose.

Caveats

  • Cascading failure modes. A bug in the retriever (missing rule, stale embeddings, broken graph traversal) can systematically bias the candidate set in a way the ranker cannot detect. End-to-end ground-truth evaluation catches this; unit-testing each stage does not.
  • Retriever recall must be measured as a first-class metric — not just ranker precision / NDCG. Without a retriever-recall number, you don't know where your ceiling is.
  • Expanding top-of-funnel is not free. More candidates means more ranker cost. The dial is bounded by ranker latency / throughput budget.
  • Feedback loops bias the retriever. If the retriever only surfaces items the ranker already scored highly, the system can collapse onto a shrinking candidate set. A continuous feedback loop (patterns/closed-feedback-loop-ai-features) must feed all candidate sources, not just top-ranked outcomes.

Seen in

Last updated · 319 distilled / 1,201 read