CONCEPT Cited by 1 source

Multi-hop relationship materialization¶

Definition¶

Multi-hop relationship materialization is the discipline of walking N-step paths in a graph and writing back direct N=1 edges as new facts, so future queries hit a single edge instead of an N-hop walk. Trades enrichment-time compute for query-time latency.

The pattern is the graph-database analog of a materialized view on a join — precompute the transitive closure once, query it many times.

The Netflix MDS worked example¶

"MDS doesn't just store what it's told; it derives new knowledge by walking the graph in the background." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Connecting a model instance to its A/B tests, in three hops:

Model Instance --produced-by--> Pipeline Run
Pipeline Run   --executed-for--> A/B Test Cell
A/B Test Cell  --belongs-to-->   A/B Test

Steps the enrichment job runs:

Direct link to pipeline. The model instance has a pipeline_run_id foreign key. Hydrate the pipeline run via GET /api/v1/pipeline-runs/train-weekly-ranking-20XX0101. The response reveals an ab_test_cells field.
Discover A/B test context. For each cell, hydrate the test via GET /api/v1/tests/12345. The response reveals the test name, status, and other cells (control + treatment).
Infer transitive relationship. The enrichment job now has the chain. It writes the inferred edge Model Instance ↔ A/B Test back to Datomic and triggers re-indexing.

"The job writes the inferred relationship back to Datomic and triggers re-indexing, and materializes these edges in the graph." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Why materialize the edges?¶

Three structural reasons to write back the direct edge instead of re-walking the chain on every query:

1. Query simplicity¶

After materialization, "Which A/B tests use this model?" is a single edge lookup. Without it, every query would re-walk the 3-hop path through three source systems' data — a 3-system join at query time.

2. Bidirectional queryability¶

"The reverse query also works: 'What models are being tested in experiment 12345?'" — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

If the edge is materialized, both endpoints can index into it. A walk-on-query approach forces the query engine to choose a direction; a materialized edge is direction-agnostic. See concepts/reified-edge-graph.

3. Cross-source-system path queries become local¶

Without materialization, a multi-hop query against a 3-source- system path would require live API calls to all three source systems for every query. Materialization moves the multi-source join from query time to enrichment time, and the enrichment cost is amortized across all subsequent reads.

The cost: write amplification¶

Each materialized edge is a write. For a graph with N entities and average path-length-to-derive-K of L, materialization writes O(N · K · L) edges over time. Compared to the alternative of re-walking on read, materialization trades:

More writes (during enrichment).
More storage (reified edges).
Higher fan-out on entity changes (a change to one entity may invalidate many derived edges that need to be re-materialized).

For:

Faster reads.
Bidirectional indexability.
Predictable query latency (no live source-API hops).

The tradeoff is the right shape for read-heavy catalog / lineage / discovery workloads, where each entity is read many times for each time it changes.

Re-materialization on entity change¶

If an upstream entity changes, derived edges may become stale. The async enrichment loop handles this by:

Marking entities uncached when they change (concepts/async-relationship-inference).
Re-walking and re-materializing affected paths on the next enrichment cycle.

The post is explicit that this is bounded: "newly discovered relationships may appear with a short delay after the underlying entities are created (typically minutes rather than seconds)." The same applies to re-materialization after change.

Distinct from query-time graph traversal¶

Modern graph databases (Neo4j, Datomic with Datalog) can express multi-hop queries directly. Why materialize?

Aspect	Query-time walk	Materialized edge
Query latency	O(path length × per-hop join)	O(1) edge lookup
Storage cost	Just the direct edges	Direct + derived edges
Stale-edge risk	None (fresh on every query)	Yes (until re-materialized)
Cross-system source calls	Live, on every query	Once, during enrichment
Best for	Small graphs, infrequent multi-hop queries	Large graphs, frequent multi-hop queries

For Netflix MDS, the graph spans six source systems and the multi- hop queries (lineage, impact analysis, A/B-test attribution) are the dominant read pattern, not an occasional one. Hence materialization is the right shape.

Seen in¶

sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS materializes multi-hop edges (Model Instance ↔ A/B Test, Feature ↔ Consuming Model, Dataset ↔ Downstream Experiment) in async enrichment jobs.