CONCEPT Cited by 3 sources
Retrieval → ranking funnel¶
Definition¶
The retrieval → ranking funnel is the canonical two-stage architecture for recommendation, search, and recommendation-like systems at scale:
- Retrieval (stage 1). A cheap, high-recall primitive narrows an intractably large candidate population (millions to billions of items) to a rank-tractable set — typically 10² to 10⁴ candidates.
- Ranking (stage 2). A more expensive model — cross-encoder, LLM, or a large MTML network — scores or orders the narrowed set and produces a ranked short-list (top-K) to present to the user.
The asymmetric cost structure — retriever runs on every request against a huge pool, ranker runs only over a small narrowed set — is what makes the overall system affordable at production request volume.
Structural properties¶
- Retriever recall is the ceiling on end-to-end accuracy. If the correct / best item doesn't survive retrieval, no amount of ranking quality can recover it. The Meta Friend Bubbles post (sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels) states this directly: "By explicitly retrieving friend-interacted content, we expand the top of the funnel to ensure sufficient candidate volume for downstream ranking stages. This is important because, without it, high-quality friend content may never enter the ranking pipeline in the first place."
- Ranker precision is the ceiling on how cleanly the top-K isolates the right answer. The two ceilings compose multiplicatively — both must meet their bars independently.
- Expanding top-of-funnel is a dial. When a new candidate class (friend-interacted Reels, a new content vertical, a new index) is missing from the ranker's output, the fix is often at retrieval, not ranking.
The retriever choice space¶
- Heuristic retrieval. Domain rules — ownership, social graph + closeness threshold, code-graph traversal, time windows. Fast, interpretable, limited to encoded knowledge. Used by Meta RCA (ownership + code graph) and Meta Friend Bubbles (close-friend candidate sourcing via viewer-friend closeness).
- Lexical (BM25). Term-frequency scoring. Fast, interpretable, limited to surface keyword match. Canonical for text search.
- Vector + ANN. Learned embeddings + approximate nearest-neighbour search. Handles semantic similarity; needs embedding infra.
- Hybrid. Combined lexical + vector. Industry default for document search.
- Two-tower / multi-tower recall models. Purpose-trained retrieval models for recommendation — typical at Meta / Google / YouTube / TikTok scale. Not named by this Meta post but the standard family for "friend-interacted content retrieval."
Choice is domain-driven. Monorepo RCA has structured ownership + code graph (heuristics win); open-domain document search benefits from hybrid; Reels-scale recommendation typically combines heuristic closeness-based retrieval with embedding-based video-similarity recall.
The ranker choice space¶
- MTML models. Multi-task multi-label deep networks with shared encoders + task-specific heads, optimising many engagement targets jointly. The industry default for large-scale recommendation ranking (Meta, Google, TikTok). Meta Friend Bubbles: early-stage + late-stage MTML models with new bubble-conditioned tasks.
- LLM ranker. LLM scores / orders the narrowed set in natural language. Meta RCA: fine-tuned Llama-2 (7B) running ranking-via-election.
- Cross-encoder. Smaller Transformer scoring (query, candidate). Cheaper than LLM; less reasoning capacity. Canonical for document search reranking.
- Pointwise classifier. Small domain-trained model outputs a score per (query, candidate). Cheapest; weakest.
Canonical wiki instances¶
- Meta Friend Bubbles (2026-03-18) — recommendation instance. Heuristic closeness-based retrieval + MTML ranking with conditional-probability bubble objective. Canonical datum for expanding top of funnel as the fix for missing candidate class.
- Meta RCA (2024-06) — RCA / LLM instance. Heuristic ownership + code-graph retrieval + Llama-2 7B ranker via ranking-via-election. See patterns/retrieve-then-rank-llm for the LLM-specific pattern.
Relation to other framings¶
- patterns/retrieve-then-rank-llm is the LLM-specific pattern instance of this funnel concept — when the stage-2 ranker is specifically an LLM.
- concepts/llm-cascade is the sibling cascade pattern at model-size level (small LLM → large LLM), orthogonal to the stage level (retriever → ranker) described here. Both are cascades; they compose.
Caveats¶
- Cascading failure modes. A bug in the retriever (missing rule, stale embeddings, broken graph traversal) can systematically bias the candidate set in a way the ranker cannot detect. End-to-end ground-truth evaluation catches this; unit-testing each stage does not.
- Retriever recall must be measured as a first-class metric — not just ranker precision / NDCG. Without a retriever-recall number, you don't know where your ceiling is.
- Expanding top-of-funnel is not free. More candidates means more ranker cost. The dial is bounded by ranker latency / throughput budget.
- Feedback loops bias the retriever. If the retriever only surfaces items the ranker already scored highly, the system can collapse onto a shrinking candidate set. A continuous feedback loop (patterns/closed-feedback-loop-ai-features) must feed all candidate sources, not just top-ranked outcomes.
Seen in¶
- sources/2026-03-18-meta-friend-bubbles-enhancing-social-discovery-on-facebook-reels — canonical recommendation-system instance with explicit top-of-funnel expansion.
- sources/2024-08-23-meta-leveraging-ai-for-efficient-incident-response — canonical LLM-ranker instance; same structural pattern in a non-recommendation domain.
- sources/2026-02-27-pinterest-bridging-the-gap-online-offline-discrepancy-l1-cvr — Pinterest ads funnel (
retrieval → L1 → L2 → auction) is the wiki's canonical multi-stage recommendation funnel example with two explicitly-tracked recall metrics at the L1 boundary: retrieval recall (among final auction winners, how many came from L1 output?) and ranking recall (among top-K by downstream utility, how many appeared in L1 output?). Pinterest's closing observation generalizes: "beyond a certain point, L1 model quality is not the bottleneck — the funnel and utility design are." Canonical demonstration that recall ceilings at each stage boundary cap end-to-end quality independent of within-stage model improvement — a structural lesson beyond two-stage simplifications.
Related¶
- concepts/heuristic-retrieval — the stage-1 primitive used by both canonical instances.
- concepts/llm-based-ranker — one stage-2 option.
- concepts/multi-task-multi-label-ranking — the recommendation-ranker option.
- concepts/hybrid-retrieval-bm25-vectors — the document-search stage-1 option.
- patterns/retrieve-then-rank-llm — the LLM-specific pattern form.
- concepts/llm-cascade — the sibling cascade at model-size level.
- systems/meta-friend-bubbles — recommendation instance.
- systems/meta-rca-system — LLM/RCA instance.