Skip to content

CONCEPT Cited by 1 source

Multi-hop relationship materialization

Definition

Multi-hop relationship materialization is the discipline of walking N-step paths in a graph and writing back direct N=1 edges as new facts, so future queries hit a single edge instead of an N-hop walk. Trades enrichment-time compute for query-time latency.

The pattern is the graph-database analog of a materialized view on a join — precompute the transitive closure once, query it many times.

The Netflix MDS worked example

"MDS doesn't just store what it's told; it derives new knowledge by walking the graph in the background."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Connecting a model instance to its A/B tests, in three hops:

Model Instance --produced-by--> Pipeline Run
Pipeline Run   --executed-for--> A/B Test Cell
A/B Test Cell  --belongs-to-->   A/B Test

Steps the enrichment job runs:

  1. Direct link to pipeline. The model instance has a pipeline_run_id foreign key. Hydrate the pipeline run via GET /api/v1/pipeline-runs/train-weekly-ranking-20XX0101. The response reveals an ab_test_cells field.
  2. Discover A/B test context. For each cell, hydrate the test via GET /api/v1/tests/12345. The response reveals the test name, status, and other cells (control + treatment).
  3. Infer transitive relationship. The enrichment job now has the chain. It writes the inferred edge Model Instance ↔ A/B Test back to Datomic and triggers re-indexing.

"The job writes the inferred relationship back to Datomic and triggers re-indexing, and materializes these edges in the graph."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Why materialize the edges?

Three structural reasons to write back the direct edge instead of re-walking the chain on every query:

1. Query simplicity

After materialization, "Which A/B tests use this model?" is a single edge lookup. Without it, every query would re-walk the 3-hop path through three source systems' data — a 3-system join at query time.

2. Bidirectional queryability

"The reverse query also works: 'What models are being tested in experiment 12345?'"sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

If the edge is materialized, both endpoints can index into it. A walk-on-query approach forces the query engine to choose a direction; a materialized edge is direction-agnostic. See concepts/reified-edge-graph.

3. Cross-source-system path queries become local

Without materialization, a multi-hop query against a 3-source- system path would require live API calls to all three source systems for every query. Materialization moves the multi-source join from query time to enrichment time, and the enrichment cost is amortized across all subsequent reads.

The cost: write amplification

Each materialized edge is a write. For a graph with N entities and average path-length-to-derive-K of L, materialization writes O(N · K · L) edges over time. Compared to the alternative of re-walking on read, materialization trades:

  • More writes (during enrichment).
  • More storage (reified edges).
  • Higher fan-out on entity changes (a change to one entity may invalidate many derived edges that need to be re-materialized).

For:

  • Faster reads.
  • Bidirectional indexability.
  • Predictable query latency (no live source-API hops).

The tradeoff is the right shape for read-heavy catalog / lineage / discovery workloads, where each entity is read many times for each time it changes.

Re-materialization on entity change

If an upstream entity changes, derived edges may become stale. The async enrichment loop handles this by:

The post is explicit that this is bounded: "newly discovered relationships may appear with a short delay after the underlying entities are created (typically minutes rather than seconds)." The same applies to re-materialization after change.

Distinct from query-time graph traversal

Modern graph databases (Neo4j, Datomic with Datalog) can express multi-hop queries directly. Why materialize?

Aspect Query-time walk Materialized edge
Query latency O(path length × per-hop join) O(1) edge lookup
Storage cost Just the direct edges Direct + derived edges
Stale-edge risk None (fresh on every query) Yes (until re-materialized)
Cross-system source calls Live, on every query Once, during enrichment
Best for Small graphs, infrequent multi-hop queries Large graphs, frequent multi-hop queries

For Netflix MDS, the graph spans six source systems and the multi- hop queries (lineage, impact analysis, A/B-test attribution) are the dominant read pattern, not an occasional one. Hence materialization is the right shape.

Seen in

Last updated · 542 distilled / 1,571 read