CONCEPT Cited by 1 source
Multi-hop relationship materialization¶
Definition¶
Multi-hop relationship materialization is the discipline of walking N-step paths in a graph and writing back direct N=1 edges as new facts, so future queries hit a single edge instead of an N-hop walk. Trades enrichment-time compute for query-time latency.
The pattern is the graph-database analog of a materialized view on a join — precompute the transitive closure once, query it many times.
The Netflix MDS worked example¶
"MDS doesn't just store what it's told; it derives new knowledge by walking the graph in the background." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph
Connecting a model instance to its A/B tests, in three hops:
Model Instance --produced-by--> Pipeline Run
Pipeline Run --executed-for--> A/B Test Cell
A/B Test Cell --belongs-to--> A/B Test
Steps the enrichment job runs:
- Direct link to pipeline. The model instance has a
pipeline_run_idforeign key. Hydrate the pipeline run viaGET /api/v1/pipeline-runs/train-weekly-ranking-20XX0101. The response reveals anab_test_cellsfield. - Discover A/B test context. For each cell, hydrate the test
via
GET /api/v1/tests/12345. The response reveals the test name, status, and other cells (control + treatment). - Infer transitive relationship. The enrichment job now has
the chain. It writes the inferred edge
Model Instance ↔ A/B Testback to Datomic and triggers re-indexing.
"The job writes the inferred relationship back to Datomic and triggers re-indexing, and materializes these edges in the graph." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph
Why materialize the edges?¶
Three structural reasons to write back the direct edge instead of re-walking the chain on every query:
1. Query simplicity¶
After materialization, "Which A/B tests use this model?" is a single edge lookup. Without it, every query would re-walk the 3-hop path through three source systems' data — a 3-system join at query time.
2. Bidirectional queryability¶
"The reverse query also works: 'What models are being tested in experiment 12345?'" — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph
If the edge is materialized, both endpoints can index into it. A walk-on-query approach forces the query engine to choose a direction; a materialized edge is direction-agnostic. See concepts/reified-edge-graph.
3. Cross-source-system path queries become local¶
Without materialization, a multi-hop query against a 3-source- system path would require live API calls to all three source systems for every query. Materialization moves the multi-source join from query time to enrichment time, and the enrichment cost is amortized across all subsequent reads.
The cost: write amplification¶
Each materialized edge is a write. For a graph with N entities and average path-length-to-derive-K of L, materialization writes O(N · K · L) edges over time. Compared to the alternative of re-walking on read, materialization trades:
- More writes (during enrichment).
- More storage (reified edges).
- Higher fan-out on entity changes (a change to one entity may invalidate many derived edges that need to be re-materialized).
For:
- Faster reads.
- Bidirectional indexability.
- Predictable query latency (no live source-API hops).
The tradeoff is the right shape for read-heavy catalog / lineage / discovery workloads, where each entity is read many times for each time it changes.
Re-materialization on entity change¶
If an upstream entity changes, derived edges may become stale. The async enrichment loop handles this by:
- Marking entities uncached when they change (concepts/async-relationship-inference).
- Re-walking and re-materializing affected paths on the next enrichment cycle.
The post is explicit that this is bounded: "newly discovered relationships may appear with a short delay after the underlying entities are created (typically minutes rather than seconds)." The same applies to re-materialization after change.
Distinct from query-time graph traversal¶
Modern graph databases (Neo4j, Datomic with Datalog) can express multi-hop queries directly. Why materialize?
| Aspect | Query-time walk | Materialized edge |
|---|---|---|
| Query latency | O(path length × per-hop join) | O(1) edge lookup |
| Storage cost | Just the direct edges | Direct + derived edges |
| Stale-edge risk | None (fresh on every query) | Yes (until re-materialized) |
| Cross-system source calls | Live, on every query | Once, during enrichment |
| Best for | Small graphs, infrequent multi-hop queries | Large graphs, frequent multi-hop queries |
For Netflix MDS, the graph spans six source systems and the multi- hop queries (lineage, impact analysis, A/B-test attribution) are the dominant read pattern, not an occasional one. Hence materialization is the right shape.
Seen in¶
- sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS materializes multi-hop edges (Model Instance ↔ A/B Test, Feature ↔ Consuming Model, Dataset ↔ Downstream Experiment) in async enrichment jobs.