PATTERN Cited by 1 source
Separate vs Combined Index (hybrid search topology)¶
Intent¶
The core architectural choice when deploying hybrid retrieval (lexical + vector): do you keep keyword and vector data in separate indexes, one per modality — or combine them into one index that stores both representations of each document together?
MongoDB's 2025-09-30 framing of the trade-off:
"Separate indexes give more freedom to tweak each search type, scale them differently, and experiment with scoring. The compromise is higher complexity, with two pipelines to manage and the need to normalize scores. On the other hand, a combined index is easier to manage, avoids duplicate pipelines, and can be faster since both searches run in a single pass. However, it limits flexibility to what the search engine supports and ties the scaling of keyword and vector search together. The decision is mainly a trade-off between control and simplicity."
The two topologies aren't equal — each has a signature vendor profile, a signature operational shape, and a signature failure mode.
Two topologies¶
Separate indexes (lexical-first profile)¶
Document → ┌──→ [Lexical (BM25 inverted) index] → top-K lexical candidates ┐
└──→ [Vector (HNSW / IVF) index] → top-K vector candidates ─┴─→ fusion → top-N
- Two independent ingestion pipelines (text → tokens → inverted index; text → embedding → vector index).
- Two independent query paths, fused at a later stage.
- Different scaling dimensions — lexical is disk-I/O-heavy (inverted index), vector is memory-heavy (graph/IVF). They scale by different physical resources.
Canonical vendors: Elasticsearch, OpenSearch, MongoDB Atlas (Atlas Search + Atlas Vector Search), Solr.
Combined index (vector-first profile)¶
Document → [One index: stores dense + sparse + metadata for each doc] → single-pass multi-modality query → top-N
- One ingestion pipeline producing multiple representations per document.
- One query structure with both dense and sparse vectors (often), plus metadata filters.
- Shared scaling — the whole index scales together.
Canonical vendors: Pinecone, Weaviate, Milvus, Qdrant — typically realized via sparse vectors for the lexical side rather than an inverted index.
Trade-offs side by side¶
| Axis | Separate indexes | Combined index |
|---|---|---|
| Per-modality tuning | Independent — different BM25 k1/b per workload, different HNSW parameters per workload | Constrained — the shared index structure dictates what can vary |
| Scaling | Independent — scale lexical tier without touching vector tier | Coupled — they scale together |
| Operational complexity | Higher — two pipelines, two health models, two ingest-path failure modes | Lower — one index, one pipeline |
| Fusion | Needed (RRF / RSF / weighted sum / interleave) — scores aren't comparable by default | Often handled by the engine natively (single-pass multi-modal score) |
| Ingestion cost | Both sides indexed separately; more total work | Single indexing pass |
| Query latency | Two round-trips (serial) or two fan-outs (parallel); fusion step adds latency | One round-trip, engine-native combination |
| Maturity of each modality | Lexical-first deployments: strong BM25; added vectors may be newer | Vector-first deployments: strong vectors; added lexical (via sparse) may be less mature |
| Experimentation | Easy — swap one engine, one scoring function, one fusion algorithm without touching the other | Harder — experiments are bounded by what the combined index lets you vary |
Signature failure modes¶
Separate-indexes failure: ingestion drift¶
Two independent ingestion pipelines mean the two indexes can drift — a document present in the lexical index is missing from the vector index (embedding-generation failed, or the embedding-model upgrade re-indexed only half) and hybrid queries silently return degraded results. Mitigations: shared document-ID sequencing, cross-index consistency checkers, periodic re-index discrepancy reports.
Combined-index failure: can't tune independently¶
Need to raise BM25 recall without blowing up the vector ANN memory? Can't — the index structure dictates both. Need to A/B test a new embedding model? The whole index has to be swapped at once. Operational simplicity costs tuning flexibility.
When to pick which¶
Pick separate indexes when:¶
- Advanced lexical features matter — phrase queries, proximity, per-field boosts, language-specific analyzers, stemming variations. Inverted-index implementations (Lucene-family) lead here.
- Modality-specific scaling shapes differ sharply — e.g. vector queries are 10× the QPS of lexical, or you want to put search traffic on a GPU tier.
- You need to experiment independently with fusion strategies, different embedding models, new scoring functions.
- Your team has the operational maturity to run two pipelines without drift.
MongoDB's thesis is that for advanced lexical requirements, a lexical-first substrate with separate indexes is the optimal shape — pairing BM25 on Lucene with vector ANN in a second index.
Pick combined index when:¶
- Operational simplicity dominates — smaller team, less bandwidth for dual-pipeline management.
- Lexical requirements are basic — mostly-keyword matching without phrase / proximity / boost complexity.
- Latency is critical — a single-pass multi-modal query avoids the fan-out + fusion overhead.
- You're already invested in a vector-first platform and don't want to introduce a second search tier.
Related compound pattern: dual native access¶
The two topologies aren't always a binary choice — vendors like MongoDB Atlas offer separate indexes exposed through one unified query language. Atlas Search + systems/atlas-vector-search are physically separate (Search Nodes host them with their own compute tier for independent scaling) but the MQL $search and $vectorSearch aggregation stages sit in the same pipeline. A native hybrid-search function further unifies this by handling fusion at the engine level. This composite is "separate indexes, combined surface" — the control of separate physical tiers with the simplicity of a single query API.
Seen in¶
- sources/2025-09-30-mongodb-top-considerations-when-choosing-a-hybrid-search-solution — MongoDB names the separate-vs-combined distinction as one of the key architectural trade-offs in hybrid-search deployment; aligns the distinction with vendor-origin bias ("Lexical-first solutions … hybrid setups that use separate indexes. Vector-first platforms … tend to use a single index").
Related¶
- concepts/hybrid-retrieval-bm25-vectors — the retrieval stack this pattern architects.
- concepts/reciprocal-rank-fusion — fusion algorithm typically paired with separate indexes.
- concepts/relative-score-fusion — alternative fusion algorithm for separate indexes.
- concepts/sparse-vector — the combined-index modality for lexical in vector-first platforms.
- patterns/native-hybrid-search-function — composable on top of separate indexes to regain combined-index simplicity at the query-API layer.
- patterns/independent-scaling-tiers — the generic scaling pattern separate indexes realize when the two modalities diverge in resource profile.
- systems/atlas-hybrid-search — MongoDB's productized instance of separate-indexes-with-unified-query-API.