PATTERN Cited by 1 source

Dual-store graph + search index¶

Definition¶

The dual-store graph + search index pattern is a storage shape for catalog / metadata / discovery services where:

A graph database (or immutable fact store) is the system of record — used for relationship-heavy navigational queries (multi-hop traversal, lineage, impact analysis).
A search index (Elasticsearch / OpenSearch / Lucene) is a derived view — used for free-text discovery, faceted filter, fuzzy match.

Writes go to both stores synchronously on the ingestion path; both reflect the same canonical state.

The Netflix MDS instance¶

"Once normalized, entities are persisted to Datomic, which serves as both a local cache and a graph database. Immediately after writing to Datomic, entities are indexed in Elasticsearch to power fast, full-text search across the catalog." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

The split:

Access pattern	Store	Why
Free-text search box, type-as-you-go discovery	systems/elasticsearch	BM25, fuzzy match, faceted filter, relevance boosting
"Walk from this model to its features to their data sources"	systems/datomic	Multi-hop graph traversal via Datalog
"List all models owned by this team"	systems/elasticsearch	Filter by tag
"Which A/B tests use this model?"	systems/datomic	Reified edge lookup
Entity detail page (description + owners + tags)	Either; ES is faster for warm cache	—

"Elasticsearch powers the entry point into the system: users typically start with a free-text search in the AIP Portal (for a model name, a team, or a domain term), and then switch to graph navigation once they land on an entity page." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Why two stores¶

Search engines and graph databases have orthogonal strengths:

Capability	Search engine	Graph DB
Fuzzy / typo-tolerant text search	✅	❌
Free-text relevance scoring	✅	❌
Cross-field aggregation / facet	✅	partial
Multi-hop traversal	❌	✅
Reified-edge queries (both directions)	❌	✅
Recursive / transitive-closure queries	❌	✅
Schema flexibility	✅	varies

Trying to do graph traversal in Elasticsearch results in N+1 queries with painful client-side joining; trying to do free-text search in a graph DB results in slow scans without proper inverted-index support. The clean answer is to use both.

The synchronous-on-ingest dual write¶

"This happens synchronously within the event processing flow." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Specifically: write Datomic first (system of record), then Elasticsearch (derived view). If ES write fails, the event isn't considered fully processed and can retry. ES is eventually consistent with Datomic but typically within milliseconds on the happy path:

"Indexing happens in near real-time as part of the ingestion and enrichment workflows, so changes are usually visible in the Portal with a short delay that is acceptable for interactive use." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

When async enrichment derives new edges (patterns/async-graph-enrichment-job), both stores are updated again.

Index-layout decision: single index + entityType discriminator¶

A subtle but important Elasticsearch-side decision in Netflix MDS:

"Single entities index: All entity types (models, features, pipelines, etc.) are indexed in one unified index, differentiated by the entityType field. Separate owners index: Dedicated index for users and groups to enable cross-entity owner searches." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

This makes a free-text search "ranking-v5" one query against one index, returning models, pipelines, features, etc. with the entityType filter narrowing post-search. The dual of the unified URI namespace (concepts/entity-uri-namespace) on the indexing side.

Owners get their own index because "who owns anything called ranking" is a different query shape than "find entities matching name ranking-v5".

Tags as key-value pairs¶

"Tags: Domain-specific metadata stored as key-value pairs (e.g., team::personalization, env::production, model.state::released)." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Stored as Elasticsearch fields, indexed for filter, and resolvable back to the same data in Datomic. Single canonical representation, two stores.

Failure modes to design for¶

ES write fails after Datomic write succeeds. Datomic is ahead of ES. The next read against ES sees stale state. Mitigate with retry queue + reconciliation job.
Datomic write fails. Don't write ES — both stores stay consistent (both don't have the entity).
Async enrichment derives new edge in Datomic, ES re-index fails. Same reconciliation pattern.
ES schema migration. Re-index from Datomic; Datomic is the source of truth.

When to use it¶

Catalog / metadata / discovery service.
Both find (free-text) and explore (graph) are first-class user actions.
Dataset is large enough that walking the graph at search time would be too slow.
Schema evolves (new entity types, new attributes) — append to facts in the graph store, re-index.

When not to use it¶

Pure search workload — just use Elasticsearch.
Pure graph workload — just use a graph DB.
The dataset is small enough that one store can handle both shapes.
Operational cost of running two stores exceeds the value of the split.

patterns/thin-event-plus-source-hydration — ingestion that feeds the dual store.
patterns/async-graph-enrichment-job — derives new edges that must be propagated to both stores.

Seen in¶

sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS uses Datomic (graph) + Elasticsearch (search) as the canonical wiki instance of this pattern.