Skip to content

PATTERN Cited by 1 source

Consolidate database and search

Intent

Collapse the "primary database + bolt-on search engine + sync pipeline" shape into a single substrate where search is a first-class capability of the primary database and queries are served from the same cluster without an intermediate derived-index pipeline. Eliminates the synchronization tax as a structural property, not as an operational discipline to sustain.

Context: the shape being left

Most production systems of record reach a point where full-text / relevance search outgrows the primary database's built-in capability (equality / range / LIKE / basic trigram). The default move is to add a dedicated search engine beside the primary:

┌─────────────┐   CDC / dual-write   ┌────────────────┐
│  Primary DB │ ────────────────────▶│  Search index  │
│  (Postgres, │                       │  (ES, OS, Solr,│
│   MySQL,    │                       │   Algolia)     │
│   ...)      │                       └────────────────┘
└─────────────┘

This works at small scale. At scale it accumulates:

  • Pipeline-engineering toil (dual-write race logic, CDC lag, schema-change coordination across two systems).
  • Real-time-freshness ceilings ("why does the agent dashboard show old data?").
  • Operational doubling (capacity / patching / upgrade / backup / IAM / on-call for both systems).
  • Developer-onboarding cost (new hires must learn both systems' failure modes — a real concern at 10×-in-a-year hiring velocity per the Cars24 Sharma quote).

The consolidation pattern attacks the pipeline itself rather than the pipeline's bugs.

Mechanism

  1. Pick a primary database whose vendor ships embedded search as a peer capability, not a tacked-on feature. Named instances:
  2. MongoDB Atlas + Atlas Search (BM25 on Lucene) — Cars24's canonical case.
  3. PostgreSQL full-text search (tsvector / tsquery / GIN / ts_rank) + pgvector for semantic.
  4. Oracle Text, SQL Server Full-Text Search — legacy analogues.
  5. Index is populated as a side-effect of the primary write — same cluster, same transaction boundary, no pipeline. Atlas-Search maintains its Lucene segments directly from MongoDB's oplog; Postgres FTS builds its GIN index inside the same MVCC transaction.
  6. Queries use a single API surface — MQL aggregation stages on Atlas, SELECT ... WHERE to_tsvector @@ to_tsquery in Postgres — so the application layer routes by intent within one driver, not across two systems.
  7. Consolidate identity / auth / backup / monitoring under the primary's envelope. Atlas Search inherits the database's IAM, VPC peering, audit logs, backup / PITR; Postgres FTS lives inside pg_dump.
  8. Split compute when needed (optional). MongoDB Search Nodes let search workload scale independently of OLTP without reintroducing a separate system — same data, same Atlas control plane, separate compute. Postgres equivalents: read replicas dedicated to FTS queries.

Structural guarantees

  • No independent sync pipeline exists to fail. The primary write is the index write.
  • Freshness is bounded by the primary's write commit, not by a pipeline's lag. Stale-index-driven product pain disappears structurally.
  • Schema changes hit one system. Adding a field to a document automatically exposes it to the search engine via the same cluster's schema mechanism; no mapping file to maintain.
  • One on-call rotation, one capacity model, one upgrade cadence, one IAM surface.

Trade-offs

  • Feature-parity ceiling. Embedded search engines typically lag dedicated ones on advanced features — custom analyzers, percolator queries, cross-cluster search, ILM tiering, sparse retrievers (ELSER). Validate your critical queries against the embedded engine before committing.
  • Resource coupling. One cluster carries both workloads; search's memory-hungry analyzers + OLTP's latency-sensitive reads compete. The escape hatch is a dedicated search-tier (Atlas Search Nodes, Postgres read replicas) — scale independently but still inside the primary's envelope.
  • Vendor lock-in deepens. Atlas Search is MongoDB-only; Postgres FTS doesn't port to MySQL. The pattern trades vendor-diversity optionality for operational simplicity.
  • Migration cost is not zero. Reindexing the primary, adjusting queries, validating relevance parity — effectively the same five-phase migration shape, applied to a search-engine swap instead of a managed-service swap.

When it applies

  • Your search feature surface is broadly served by the embedded engine (BM25 + analyzers + basic aggregations; most line-of- business search fits).
  • Your organization is growing fast and developer-onboarding cost compounds on per-system learning curve (Cars24 10×/year hiring).
  • The sync pipeline has become a named on-call surface / ticket class ("synchronization challenges and overwhelming administrative overhead").
  • You want one consistent operational envelope (HA + encryption + audit + backup) across everything the application touches.

When it doesn't apply

  • Your search workload relies on advanced Elasticsearch/OpenSearch features the embedded engine doesn't provide (ELSER, learned sparse retrievers, ILM tiering with S3 cold storage, percolator queries, advanced cross-cluster replication).
  • You need independently-scaling search clusters per product line or per tenant, sized differently from the primary.
  • You're running multi-primary (MongoDB + Postgres + MySQL) and the consolidation would fragment search across three embedded engines instead of unifying on one dedicated one.
  • A dedicated search cluster is already a well-owned platform component (specialized team, mature pipeline, negligible synchronization-tax delta); migrating to an embedded engine to chase the "one system" simplicity isn't worth the disruption.

patterns/native-hybrid-search-function extends the same consolidation logic to lexical + vector — instead of adding a third bolt-on (vector DB) beside the primary + search engine, the primary vendor ships hybrid lexical-vector retrieval as a first-class API. Atlas Search + Atlas Vector Search + Atlas Hybrid Search on MongoDB is the paradigmatic stack; Postgres + pgvector + tsvector + hybrid-ranking extensions is the Postgres-side analogue.

The two patterns compose: consolidate DB + search first, then add native hybrid without introducing a separate vector store.

Contrasting patterns

  • patterns/separate-vs-combined-index is orthogonal — it's about whether lexical + vector live in one index or two within the search substrate. Consolidation here is a level up: whether search is separate-substrate-from-primary-DB at all.
  • Query federation (unified API, separate stores) keeps the bolt-on but hides it behind one endpoint. Doesn't eliminate the synchronization tax, just relocates it.
  • patterns/dual-write-migration is the migration tool applied during consolidation — old and new coexist while search traffic shifts.

Costs

  • Reindexing primary into the embedded engine (mostly one-time).
  • Migrating all queries from the old engine's DSL to the embedded engine's DSL (application code).
  • Relevance-parity validation against production query log (pre / post A/B on NDCG, click-through, time-to-first-click if measured).
  • Capacity uplift on the primary cluster to absorb search workload (or provision dedicated search nodes).
  • Training (everyone on the new search query shape).

Seen in

  • sources/2025-10-12-mongodb-cars24-improves-search-for-300-million-users-with-atlas — canonical instance. Cars24 left Postgres + bolt-on Elasticsearch-class search index (with sync pipelines maintained by multiple engineering teams) for MongoDB Atlas on Google Cloud, with Atlas Search (BM25 on Lucene) running inside the same cluster as the operational store. MongoDB named the cost class explicitly: "Avoid synchronization tax." Second Cars24 migration in the same article — ArangoDB → MongoDB for geospatial search at 50 % cost savings — is a database-consolidation instance of the same shape (consolidate the specialized store into the general-purpose primary).
Last updated · 200 distilled / 1,178 read