Skip to content

PATTERN Cited by 1 source

Decoupled parallel retrieval pipelines

Pattern

Run lexical (inverted-index + BM25) and dense semantic (encoder + ANN) retrieval as parallel, independent pipelines, fed by a shared query-preprocessing stage, and merge candidates only at the ranker — not at retrieval time, not via score fusion upstream.

                query
        ┌──────────────────────┐
        │ Query preprocessing  │ ← tokenize / normalize / rewrite
        │ (shared stage)       │
        └──────────┬───────────┘
        ┌──────────┴──────────┐
        │                     │
        ▼                     ▼
  ┌──────────┐         ┌─────────────┐
  │ Lexical  │         │ Semantic    │
  │ Unicorn  │         │ SSR + Faiss │
  │ →candidates│        │ →candidates │
  └────┬─────┘         └──────┬──────┘
       │                      │
       └──────────┬───────────┘
        ┌────────────────────┐
        │ L2 MTML ranker     │
        │  merges + reranks  │
        │  features: TF-IDF, │
        │   BM25, cosine,    │
        │   …                │
        └────────────────────┘

Why decoupled, not merged

  • Independence at retrieval time means neither arm depends on the other's output; a failure of one does not degrade the other.
  • Features stay distinct — lexical features (TF-IDF, BM25) and semantic features (cosine similarity) enter the ranker as separate inputs; the ranker learns the fusion rather than a hand-tuned upstream score-combiner being forced to.
  • Scalable and parallel — lexical and semantic have very different cost profiles (inverted-index lookup vs encoder + ANN); running them in parallel exploits both.
  • Separation of concerns — the lexical path can evolve (index compaction, tokenizer updates) independently of the encoder (model retraining, index rebuild).

Canonical instance — Meta Groups Scoped Search (2026-04-21)

From the 2026-04-21 Meta Engineering post:

"We modernized the retrieval stage by decoupling the query processing into two parallel pathways, ensuring we capture both exact terms and broad concepts."

Concrete components:

  • Shared preprocessing: tokenization, normalization, rewriting.
  • Lexical path: Unicorn inverted index.
  • Semantic path: SSR (12-layer 200M-param) → Faiss ANN.
  • Ranker: MTML L2 supermodel on clicks + shares + comments.

Canonical wiki statement Meta makes at the ranker stage:

"Merging results from two fundamentally different paradigms — sparse lexical features and dense semantic features — required a sophisticated ranking strategy."

Relation to sibling patterns

Caveats

  • The ranker becomes the fusion bottleneck — its architecture must handle heterogeneous feature sets (sparse + dense). Meta explicitly notes this: "Merging results from two fundamentally different paradigms... required a sophisticated ranking strategy" — MTML with FP8/selective layers in their case.
  • Query preprocessing must serve both arms, which constrains its design (cf concepts/query-preprocessing-tokenization-normalization). A preprocessing change that helps lexical but hurts the encoder's distribution can regress semantic quality.
  • Candidate-set sizing matters — each arm surfaces its own top-K; the ranker has to handle the merged set.

Seen in

Last updated · 550 distilled / 1,221 read