PATTERN Cited by 1 source

Decoupled parallel retrieval pipelines¶

Pattern¶

Run lexical (inverted-index + BM25) and dense semantic (encoder + ANN) retrieval as parallel, independent pipelines, fed by a shared query-preprocessing stage, and merge candidates only at the ranker — not at retrieval time, not via score fusion upstream.

                query
                  │
                  ▼
        ┌──────────────────────┐
        │ Query preprocessing  │ ← tokenize / normalize / rewrite
        │ (shared stage)       │
        └──────────┬───────────┘
                   │
        ┌──────────┴──────────┐
        │                     │
        ▼                     ▼
  ┌──────────┐         ┌─────────────┐
  │ Lexical  │         │ Semantic    │
  │ Unicorn  │         │ SSR + Faiss │
  │ →candidates│        │ →candidates │
  └────┬─────┘         └──────┬──────┘
       │                      │
       └──────────┬───────────┘
                  │
                  ▼
        ┌────────────────────┐
        │ L2 MTML ranker     │
        │  merges + reranks  │
        │  features: TF-IDF, │
        │   BM25, cosine,    │
        │   …                │
        └────────────────────┘

Why decoupled, not merged¶

Independence at retrieval time means neither arm depends on the other's output; a failure of one does not degrade the other.
Features stay distinct — lexical features (TF-IDF, BM25) and semantic features (cosine similarity) enter the ranker as separate inputs; the ranker learns the fusion rather than a hand-tuned upstream score-combiner being forced to.
Scalable and parallel — lexical and semantic have very different cost profiles (inverted-index lookup vs encoder + ANN); running them in parallel exploits both.
Separation of concerns — the lexical path can evolve (index compaction, tokenizer updates) independently of the encoder (model retraining, index rebuild).

Canonical instance — Meta Groups Scoped Search (2026-04-21)¶

From the 2026-04-21 Meta Engineering post:

"We modernized the retrieval stage by decoupling the query processing into two parallel pathways, ensuring we capture both exact terms and broad concepts."

Concrete components:

Shared preprocessing: tokenization, normalization, rewriting.
Lexical path: Unicorn inverted index.
Semantic path: SSR (12-layer 200M-param) → Faiss ANN.
Ranker: MTML L2 supermodel on clicks + shares + comments.

Canonical wiki statement Meta makes at the ranker stage:

"Merging results from two fundamentally different paradigms — sparse lexical features and dense semantic features — required a sophisticated ranking strategy."

Relation to sibling patterns¶

patterns/hybrid-retrieval-bm25-vectors — the umbrella pattern. This pattern is the decoupled-parallel-pipelines realization.
patterns/parallel-retrieval-fusion — focuses on the ranker-side fusion. Complementary: fusion is what this pattern's pipelines feed into.
patterns/hybrid-lexical-vector-interleaving — an interleaving/score-fusion realization where lexical + semantic candidates are combined before ranking (min-max normalization, RRF, etc.). A distinct architecture from this pattern's "merge at ranker".

Caveats¶

The ranker becomes the fusion bottleneck — its architecture must handle heterogeneous feature sets (sparse + dense). Meta explicitly notes this: "Merging results from two fundamentally different paradigms... required a sophisticated ranking strategy" — MTML with FP8/selective layers in their case.
Query preprocessing must serve both arms, which constrains its design (cf concepts/query-preprocessing-tokenization-normalization). A preprocessing change that helps lexical but hurts the encoder's distribution can regress semantic quality.
Candidate-set sizing matters — each arm surfaces its own top-K; the ranker has to handle the merged set.

Seen in¶

sources/2026-04-21-meta-modernizing-facebook-groups-search