Skip to content

PATTERN Cited by 1 source

Separate vs Combined Index (hybrid search topology)

Intent

The core architectural choice when deploying hybrid retrieval (lexical + vector): do you keep keyword and vector data in separate indexes, one per modality — or combine them into one index that stores both representations of each document together?

MongoDB's 2025-09-30 framing of the trade-off:

"Separate indexes give more freedom to tweak each search type, scale them differently, and experiment with scoring. The compromise is higher complexity, with two pipelines to manage and the need to normalize scores. On the other hand, a combined index is easier to manage, avoids duplicate pipelines, and can be faster since both searches run in a single pass. However, it limits flexibility to what the search engine supports and ties the scaling of keyword and vector search together. The decision is mainly a trade-off between control and simplicity."

— (MongoDB, 2025-09-30)

The two topologies aren't equal — each has a signature vendor profile, a signature operational shape, and a signature failure mode.

Two topologies

Separate indexes (lexical-first profile)

Document → ┌──→ [Lexical (BM25 inverted) index] → top-K lexical candidates ┐
           └──→ [Vector (HNSW / IVF) index]     → top-K vector candidates ─┴─→ fusion → top-N
  • Two independent ingestion pipelines (text → tokens → inverted index; text → embedding → vector index).
  • Two independent query paths, fused at a later stage.
  • Different scaling dimensions — lexical is disk-I/O-heavy (inverted index), vector is memory-heavy (graph/IVF). They scale by different physical resources.

Canonical vendors: Elasticsearch, OpenSearch, MongoDB Atlas (Atlas Search + Atlas Vector Search), Solr.

Combined index (vector-first profile)

Document → [One index: stores dense + sparse + metadata for each doc] → single-pass multi-modality query → top-N
  • One ingestion pipeline producing multiple representations per document.
  • One query structure with both dense and sparse vectors (often), plus metadata filters.
  • Shared scaling — the whole index scales together.

Canonical vendors: Pinecone, Weaviate, Milvus, Qdrant — typically realized via sparse vectors for the lexical side rather than an inverted index.

Trade-offs side by side

Axis Separate indexes Combined index
Per-modality tuning Independent — different BM25 k1/b per workload, different HNSW parameters per workload Constrained — the shared index structure dictates what can vary
Scaling Independent — scale lexical tier without touching vector tier Coupled — they scale together
Operational complexity Higher — two pipelines, two health models, two ingest-path failure modes Lower — one index, one pipeline
Fusion Needed (RRF / RSF / weighted sum / interleave) — scores aren't comparable by default Often handled by the engine natively (single-pass multi-modal score)
Ingestion cost Both sides indexed separately; more total work Single indexing pass
Query latency Two round-trips (serial) or two fan-outs (parallel); fusion step adds latency One round-trip, engine-native combination
Maturity of each modality Lexical-first deployments: strong BM25; added vectors may be newer Vector-first deployments: strong vectors; added lexical (via sparse) may be less mature
Experimentation Easy — swap one engine, one scoring function, one fusion algorithm without touching the other Harder — experiments are bounded by what the combined index lets you vary

Signature failure modes

Separate-indexes failure: ingestion drift

Two independent ingestion pipelines mean the two indexes can drift — a document present in the lexical index is missing from the vector index (embedding-generation failed, or the embedding-model upgrade re-indexed only half) and hybrid queries silently return degraded results. Mitigations: shared document-ID sequencing, cross-index consistency checkers, periodic re-index discrepancy reports.

Combined-index failure: can't tune independently

Need to raise BM25 recall without blowing up the vector ANN memory? Can't — the index structure dictates both. Need to A/B test a new embedding model? The whole index has to be swapped at once. Operational simplicity costs tuning flexibility.

When to pick which

Pick separate indexes when:

  • Advanced lexical features matter — phrase queries, proximity, per-field boosts, language-specific analyzers, stemming variations. Inverted-index implementations (Lucene-family) lead here.
  • Modality-specific scaling shapes differ sharply — e.g. vector queries are 10× the QPS of lexical, or you want to put search traffic on a GPU tier.
  • You need to experiment independently with fusion strategies, different embedding models, new scoring functions.
  • Your team has the operational maturity to run two pipelines without drift.

MongoDB's thesis is that for advanced lexical requirements, a lexical-first substrate with separate indexes is the optimal shape — pairing BM25 on Lucene with vector ANN in a second index.

Pick combined index when:

  • Operational simplicity dominates — smaller team, less bandwidth for dual-pipeline management.
  • Lexical requirements are basic — mostly-keyword matching without phrase / proximity / boost complexity.
  • Latency is critical — a single-pass multi-modal query avoids the fan-out + fusion overhead.
  • You're already invested in a vector-first platform and don't want to introduce a second search tier.

The two topologies aren't always a binary choice — vendors like MongoDB Atlas offer separate indexes exposed through one unified query language. Atlas Search + systems/atlas-vector-search are physically separate (Search Nodes host them with their own compute tier for independent scaling) but the MQL $search and $vectorSearch aggregation stages sit in the same pipeline. A native hybrid-search function further unifies this by handling fusion at the engine level. This composite is "separate indexes, combined surface" — the control of separate physical tiers with the simplicity of a single query API.

Seen in

Last updated · 200 distilled / 1,178 read