Skip to content

CONCEPT Cited by 4 sources

Hybrid retrieval (BM25 + dense vectors)

Hybrid retrieval is the pattern of combining a lexical index (BM25 / keyword) with a dense vector index (semantic embeddings) in the same retrieval pipeline, so the ranker can exploit exact-term matching and paraphrase/synonym matching at the same time.

Why "hybrid" and not one or the other

  • BM25 alone is superb at exact-term matching, acronyms, proper nouns, and queries where the user already knows the right word. Weakness: paraphrase / synonyms / cross-domain term transfer.
  • Dense vectors alone are strong at semantic matching and paraphrase robustness. Weakness: can miss exact-term matches that lexical would nail; embedding drift / out-of-domain failure.
  • Hybrid lets each cover the other's weaknesses. Ranking typically fuses lexical + semantic scores (weighted sum, reciprocal rank fusion, or a learned ranker on top).

BM25 is a workhorse, not a fallback

Dash's own framing from sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash:

"Today we use both a lexical index—using BM25—and then store everything as dense vectors in a vector store. While this allows us to do hybrid retrieval, we found BM25 was very effective on its own with some relevant signals. It's an amazing workhorse for building out an index."

Two production signals in that quote:

  1. BM25 is the primary retrieval surface, not a fallback or a pre-filter. "Very effective on its own with some relevant signals" is a strong claim from a team doing modern agentic RAG.
  2. Dense vectors are additive, not a replacement. Hybrid enables better recall on paraphrase queries without displacing BM25 as the lexical anchor.

This contradicts a common vendor-pitched trajectory where pure vector retrieval is positioned as the successor to BM25. At Dash's scale and domain, BM25 kept its seat.

Typical hybrid pipeline shape

  1. Ingest → normalize to text (markdown normalization, text extraction from docs / images / PDFs / audio / video).
  2. Parallel: BM25 indexing + embedding generation.
  3. At query time: parallel retrieval from both indexes, top-K each.
  4. Fusion / re-ranking: scores combined (RRF, weighted sum, or a learned ranker).
  5. Multiple ranking passes applied on top (personalization, ACL filters, knowledge-graph edges — Dash's framing).

Fusion techniques: RRF vs RSF (the two industry standards)

MongoDB's 2025-09-30 post (sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution) names the two fusion algorithms that "have become standard techniques in the market":

  • Reciprocal Rank Fusion (RRF) — rank-position-based fusion. No score normalization needed; rewards cross-retriever consensus; standard form ∑ 1/(k + rank_r(d)) with k typically 60.
  • Relative Score Fusion (RSF) — score-value-based fusion with per-retriever normalization; preserves score-magnitude information; more granular than rank alone; requires normalization-method + weight tuning.

RRF is the default starting point (no calibration, universal defaults work); RSF is adopted when the rank-based information loss costs measurable quality. Other fusion shapes exist in the design space — Figma's min-max-normalized + exact-match-boost + interleave is a specific RSF-family realization.

Industry evolution: vendor-origin architecture bias

From MongoDB's 2025-09-30 post, vendor architectural origins predict hybrid-search shape:

  • Lexical-first platforms (MongoDB, Elasticsearch, OpenSearch, Solr) built around BM25 on inverted indexes; added vector search as a second index type; typically use separate indexes fused at query time. "The main challenge was to add vector search features and implement the bridging logic with their existing keyword search infrastructure."
  • Vector-first platforms (Pinecone, Weaviate, Milvus, Qdrant) built around dense-vector ANN; added lexical via sparse vectors rather than inverted indexes — "Implementing lexical search through traditional inverted indexes was often too costly due to storage differences, increased query complexity, and architectural overhead. Many adopted sparse vectors, which represent keyword importance in a way similar to traditional term-frequency methods used in lexical search."

MongoDB's architectural framing: "lexical-first systems tend to offer stronger keyword capabilities and more flexibility in tuning each search type, while vector-first systems provide a simpler, more unified hybrid experience." Wiki treats this as one-vendor positioning — the more neutral reading is that the boundary is blurring (Elasticsearch's ELSER emits learned sparse vectors; Pinecone supports hybrid natively).

Native hybrid search functions (the 2025 productization trend)

The 2025-09-30 MongoDB post names the industry-level convergence toward native hybrid-search primitives — database / search engine APIs that handle fusion internally rather than leaving score combination to application code. Examples: MongoDB Atlas Hybrid Search (2025), Elasticsearch rrf retriever (8.8+), OpenSearch hybrid query, Weaviate hybrid operator, Qdrant hybrid queries, Pinecone sparse-dense. MongoDB's position: "Solutions with hybrid search functions handle the combination of lexical and vector search natively, removing the need for developers to manually implement it. This reduces development complexity, minimizes potential errors, and ensures that result merging and ranking are optimized by default."

Re-ranking (the layer above hybrid)

Hybrid retrieval returns a candidate set; re-ranking refines ordering on top. MongoDB names "cross-encoders, learning-to-rank models, and dynamic scoring profiles" as the emerging techniques. Re-ranking is not a replacement for hybrid retrieval — it sits on top, re-scoring the top-K candidates with more expensive but higher-quality models. Typical pipeline: hybrid retrieval → cross-encoder rerank → top-N to consumer.

Composition with knowledge graphs

Dash layers a concepts/knowledge-graph on top of the hybrid index rather than replacing either lexical or vector component. The graph's "knowledge bundle" summaries are themselves re-ingested through the hybrid index pipeline (both BM25 and vector), so graph signals ride on the same retrieval surface rather than becoming a separate third query path. This keeps runtime retrieval a single fused lookup instead of three independent ones.

Tradeoffs

  • Two indexes to maintain — double the ingestion + freshness plumbing.
  • Two indexes to size — vector stores scale differently from BM25 (memory-bound, dimensional), so capacity planning is separate.
  • Ranking complexity. Fusion weights + re-ranking become hyperparameters; offline eval against NDCG-style metrics becomes a first-class concern.
  • Embedding-drift blast radius. Changing embedding models requires re-indexing the entire corpus; BM25 does not have this problem. Separate versioning.

Seen in

  • sources/2026-01-28-dropbox-knowledge-graphs-mcp-dspy-dash — Dash explicitly running BM25 + dense vectors as a hybrid index; knowledge-graph-derived "bundles" flowing through the same hybrid pipeline; BM25 as "amazing workhorse".
  • sources/2026-04-21-figma-the-infrastructure-behind-ai-search-in-figma — Figma AI Search runs two independent OpenSearch indexes (one lexical / fuzzy-match over component names and descriptions, one k-NN over CLIP embeddings) queried simultaneously; scores combined via **min-max normalization per index + exact-lexical-match boost
  • interleave (patterns/hybrid-lexical-vector-interleaving). Worked example: "mouse" returns the icon titled "Mouse" and cursor-adjacent icons. Preserves existing lexical behaviour while adding semantic recall — migration-safe hybrid rollout shape.
  • sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution — MongoDB's 2025-09-30 industry-evolution survey and buyer's guide. Names the 2022–2023 inflection when pure-vector retrieval proved insufficient; identifies RRF and RSF as the two standard fusion techniques; taxonomizes vendors as lexical-first vs vector-first; positions sparse vectors as vector-first platforms' bridging primitive to lexical; identifies the industry-level 2025 convergence on native hybrid-search functions (MongoDB Atlas's own release is one named instance, realized as systems/atlas-hybrid-search); names cross-encoders, learning-to-rank, and dynamic scoring profiles as the emerging re-ranking layer above hybrid retrieval.
  • sources/2026-04-16-cloudflare-ai-search-the-search-primitive-for-your-agentsCloudflare AI Search promotes hybrid retrieval to a managed, runtime-provisioned primitive: vector + BM25 in parallel with fusion as an instance-level config (index_method, fusion_method: "rrf" | "max", reranking: true with @cf/baai/bge-reranker-base), plus per-document metadata boost at query time (concepts/metadata-boost) and cross-instance fan-out as composable layers on top. The 2026-04-16 worked example — "ERR_CONNECTION_REFUSED timeout" — is the canonical 2026 illustration of why both engines are needed (vector for paraphrase, BM25 for exact tokens), and the two-tokenizer config (porter for natural language, trigram for code) is the first-class productisation of the BM25 content-type-awareness knob.
Last updated · 200 distilled / 1,178 read