PATTERN Cited by 1 source
Decoupled parallel retrieval pipelines¶
Pattern¶
Run lexical (inverted-index + BM25) and dense semantic (encoder + ANN) retrieval as parallel, independent pipelines, fed by a shared query-preprocessing stage, and merge candidates only at the ranker — not at retrieval time, not via score fusion upstream.
query
│
▼
┌──────────────────────┐
│ Query preprocessing │ ← tokenize / normalize / rewrite
│ (shared stage) │
└──────────┬───────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌──────────┐ ┌─────────────┐
│ Lexical │ │ Semantic │
│ Unicorn │ │ SSR + Faiss │
│ →candidates│ │ →candidates │
└────┬─────┘ └──────┬──────┘
│ │
└──────────┬───────────┘
│
▼
┌────────────────────┐
│ L2 MTML ranker │
│ merges + reranks │
│ features: TF-IDF, │
│ BM25, cosine, │
│ … │
└────────────────────┘
Why decoupled, not merged¶
- Independence at retrieval time means neither arm depends on the other's output; a failure of one does not degrade the other.
- Features stay distinct — lexical features (TF-IDF, BM25) and semantic features (cosine similarity) enter the ranker as separate inputs; the ranker learns the fusion rather than a hand-tuned upstream score-combiner being forced to.
- Scalable and parallel — lexical and semantic have very different cost profiles (inverted-index lookup vs encoder + ANN); running them in parallel exploits both.
- Separation of concerns — the lexical path can evolve (index compaction, tokenizer updates) independently of the encoder (model retraining, index rebuild).
Canonical instance — Meta Groups Scoped Search (2026-04-21)¶
From the 2026-04-21 Meta Engineering post:
"We modernized the retrieval stage by decoupling the query processing into two parallel pathways, ensuring we capture both exact terms and broad concepts."
Concrete components:
- Shared preprocessing: tokenization, normalization, rewriting.
- Lexical path: Unicorn inverted index.
- Semantic path: SSR (12-layer 200M-param) → Faiss ANN.
- Ranker: MTML L2 supermodel on clicks + shares + comments.
Canonical wiki statement Meta makes at the ranker stage:
"Merging results from two fundamentally different paradigms — sparse lexical features and dense semantic features — required a sophisticated ranking strategy."
Relation to sibling patterns¶
- patterns/hybrid-retrieval-bm25-vectors — the umbrella pattern. This pattern is the decoupled-parallel-pipelines realization.
- patterns/parallel-retrieval-fusion — focuses on the ranker-side fusion. Complementary: fusion is what this pattern's pipelines feed into.
- patterns/hybrid-lexical-vector-interleaving — an interleaving/score-fusion realization where lexical + semantic candidates are combined before ranking (min-max normalization, RRF, etc.). A distinct architecture from this pattern's "merge at ranker".
Caveats¶
- The ranker becomes the fusion bottleneck — its architecture must handle heterogeneous feature sets (sparse + dense). Meta explicitly notes this: "Merging results from two fundamentally different paradigms... required a sophisticated ranking strategy" — MTML with FP8/selective layers in their case.
- Query preprocessing must serve both arms, which constrains its design (cf concepts/query-preprocessing-tokenization-normalization). A preprocessing change that helps lexical but hurts the encoder's distribution can regress semantic quality.
- Candidate-set sizing matters — each arm surfaces its own top-K; the ranker has to handle the merged set.