Skip to content

CONCEPT Cited by 4 sources

Retrieval → ranking funnel

Definition

The retrieval → ranking funnel is the canonical two-stage architecture for recommendation, search, and recommendation-like systems at scale:

  1. Retrieval (stage 1). A cheap, high-recall primitive narrows an intractably large candidate population (millions to billions of items) to a rank-tractable set — typically 10² to 10⁴ candidates.
  2. Ranking (stage 2). A more expensive model — cross-encoder, LLM, or a large MTML network — scores or orders the narrowed set and produces a ranked short-list (top-K) to present to the user.

The asymmetric cost structure — retriever runs on every request against a huge pool, ranker runs only over a small narrowed set — is what makes the overall system affordable at production request volume.

SilverTorch face — widened funnel (2026-05-26)

Meta's SilverTorch post (Source: sources/2026-05-26-meta-silvertorch-index-as-model-a-new-retrieval-paradigm-for-recommendation-systems) describes a structural shift in how much intelligence runs at the retrieval stage:

"In traditional service-based systems, retrieval is usually constrained to a relatively narrow ANN result set, scored mostly by simple embedding similarity, with richer relevance modeling deferred to late-stage ranking. SilverTorch unlocked headroom. By keeping ANN search, filtering, and scoring inside one model, it can widen the funnel substantially. Instead of handing only a small set of candidates downstream, it can bring one to two orders of magnitude more candidates through additional learned relevance layers before final ranking. That makes retrieval contribute meaningfully to recommendation quality, not just a fast pruning step."

Two new capabilities run inside the retrieval forward pass under Index as Model:

  • Neural reranking"multi-layer perceptrons, stacked self-attention, or more structured interaction models such as mixture of logits" applied to retrieval candidates, producing a richer relevance score than dot-product similarity.
  • Multi-task scoring — composite score over multiple engagement-action probabilities (like / share / comment) produced inside retrieval, not deferred to late-stage ranking.

This does not abolish the funnel — late-stage ranking still runs over the narrowed set. What changes is the width of the candidate pool that reaches ranking and the quality of the survivors: "more candidates survive early retrieval, and they are screened by more sophisticated, multi-objective scoring before being passed to the final ranking." The wiki's earlier "retriever recall is the ceiling" property (below) becomes a substantially less binding constraint when retrieval can run multi-task scoring on a 1-2 orders-of-magnitude wider pool.

Structural properties

  • Retriever recall is the ceiling on end-to-end accuracy. If the correct / best item doesn't survive retrieval, no amount of ranking quality can recover it. The Meta Friend Bubbles post (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) states this directly: "By explicitly retrieving friend-interacted content, we expand the top of the funnel to ensure sufficient candidate volume for downstream ranking stages. This is important because, without it, high-quality friend content may never enter the ranking pipeline in the first place."
  • Ranker precision is the ceiling on how cleanly the top-K isolates the right answer. The two ceilings compose multiplicatively — both must meet their bars independently.
  • Expanding top-of-funnel is a dial. When a new candidate class (friend-interacted Reels, a new content vertical, a new index) is missing from the ranker's output, the fix is often at retrieval, not ranking.

The retriever choice space

  • Heuristic retrieval. Domain rules — ownership, social graph + closeness threshold, code-graph traversal, time windows. Fast, interpretable, limited to encoded knowledge. Used by Meta RCA (ownership + code graph) and Meta Friend Bubbles (close-friend candidate sourcing via viewer-friend closeness).
  • Lexical (BM25). Term-frequency scoring. Fast, interpretable, limited to surface keyword match. Canonical for text search.
  • Vector + ANN. Learned embeddings + approximate nearest-neighbour search. Handles semantic similarity; needs embedding infra.
  • Hybrid. Combined lexical + vector. Industry default for document search.
  • Two-tower / multi-tower recall models. Purpose-trained retrieval models for recommendation — typical at Meta / Google / YouTube / TikTok scale. Not named by this Meta post but the standard family for "friend-interacted content retrieval."

Choice is domain-driven. Monorepo RCA has structured ownership + code graph (heuristics win); open-domain document search benefits from hybrid; Reels-scale recommendation typically combines heuristic closeness-based retrieval with embedding-based video-similarity recall.

The ranker choice space

  • MTML models. Multi-task multi-label deep networks with shared encoders + task-specific heads, optimising many engagement targets jointly. The industry default for large-scale recommendation ranking (Meta, Google, TikTok). Meta Friend Bubbles: early-stage + late-stage MTML models with new bubble-conditioned tasks.
  • LLM ranker. LLM scores / orders the narrowed set in natural language. Meta RCA: fine-tuned Llama-2 (7B) running ranking-via-election.
  • Cross-encoder. Smaller Transformer scoring (query, candidate). Cheaper than LLM; less reasoning capacity. Canonical for document search reranking.
  • Pointwise classifier. Small domain-trained model outputs a score per (query, candidate). Cheapest; weakest.

Canonical wiki instances

  • Meta Friend Bubbles (2026-03-18) — recommendation instance. Heuristic closeness-based retrieval + MTML ranking with conditional-probability bubble objective. Canonical datum for expanding top of funnel as the fix for missing candidate class.
  • Meta RCA (2024-06) — RCA / LLM instance. Heuristic ownership + code-graph retrieval + Llama-2 7B ranker via ranking-via-election. See patterns/retrieve-then-rank-llm for the LLM-specific pattern.

Relation to other framings

  • patterns/retrieve-then-rank-llm is the LLM-specific pattern instance of this funnel concept — when the stage-2 ranker is specifically an LLM.
  • concepts/llm-cascade is the sibling cascade pattern at model-size level (small LLM → large LLM), orthogonal to the stage level (retriever → ranker) described here. Both are cascades; they compose.

Caveats

  • Cascading failure modes. A bug in the retriever (missing rule, stale embeddings, broken graph traversal) can systematically bias the candidate set in a way the ranker cannot detect. End-to-end ground-truth evaluation catches this; unit-testing each stage does not.
  • Retriever recall must be measured as a first-class metric — not just ranker precision / NDCG. Without a retriever-recall number, you don't know where your ceiling is.
  • Expanding top-of-funnel is not free. More candidates means more ranker cost. The dial is bounded by ranker latency / throughput budget.
  • Feedback loops bias the retriever. If the retriever only surfaces items the ranker already scored highly, the system can collapse onto a shrinking candidate set. A continuous feedback loop (patterns/closed-feedback-loop-ai-features) must feed all candidate sources, not just top-ranked outcomes.

Seen in

Last updated · 542 distilled / 1,571 read