Skip to content

CONCEPT Cited by 1 source

Reified edge graph

Definition

A reified edge graph is a data-modeling pattern where relationships between entities are stored as first-class records (facts) rather than as graph-DB-native edges or as values embedded in entity attributes. Each edge is its own queryable entity with provenance, timestamps, and the ability to be appended-to without rewriting either endpoint.

The term reified here means: the relationship is given a material existence as data, rather than being an implicit property of the participating entities.

The Netflix MDS instance

"What we store: All entity attributes as facts. Entity references (foreign keys that may point to entities not yet fully resolved). All relationships as reified edges (added by enrichment processes). Entity lifecycle state (tracking which entities are fully enriched vs awaiting hydration)."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Netflix MDS uses Datomic's immutable fact model as the substrate. Each relationship — Model Instance ↔ A/B Test, Feature ↔ Consuming Model, Dataset ↔ Downstream Pipeline — is stored as a fact (or set of facts) that can be:

  • Added without modifying either endpoint entity.
  • Annotated with provenance (which enrichment job derived it, when, what source-system response justified it).
  • Queried from either direction.
  • Retracted without losing history (Datomic appends a retraction fact; the original remains in time-travel queries).

Why reify?

1. Continuous edge addition without endpoint mutation

"Its immutable fact model means we can continuously add relationships without losing the original entity state."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

In a typical relational schema, adding a relationship requires updating one or both endpoint rows. With reified edges, the relationship is its own row; endpoints stay untouched. This matters because the enrichment process is concurrent — multiple background jobs may be deriving edges simultaneously, and a row-level write contention on entities would serialize them.

2. Bidirectional queryability

A reified edge is symmetric in storage: it has both endpoints as attributes, so the query engine can index from either side. This is the structural reason Netflix MDS supports both "Which A/B tests use this model?" and "What models are being tested in experiment 12345?" with equal efficiency (concepts/multi-hop-relationship-materialization).

3. Provenance and audit

Each reified edge carries metadata:

  • Which enrichment job created it.
  • Which source-system API call justified it.
  • When it was created.
  • Whether it has been retracted (and by what subsequent fact).

This is invaluable for debugging incorrect lineage. "Why does the graph think this model is connected to that test?" can be answered by inspecting the edge's provenance, not by re-running the enrichment manually.

4. Schema evolution

"Flexible schema evolution: Easy to add new entity types and attributes as the catalog grows."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

A new edge type is just a new attribute name. No schema migration, no ALTER TABLE, no breaking changes for existing edges.

Distinct from native graph-DB edges

Aspect Native graph edge (e.g. Neo4j relationship) Reified edge
Storage form Built-in primitive in storage engine Just another fact / record
Query language First-class graph-traversal syntax Datalog-style fact lookup
Provenance Edge properties (limited) Full fact-level history with transactions
Concurrent addition Edge-level locks possible No locks; append-only facts
Time-travel Engine-dependent Native (Datomic)
Schema for edge metadata Property-bag Same as any other entity — full attribute set

Native graph DBs are optimized for traversal; reified edges in an immutable-fact store are optimized for continuous, concurrent, provenance-preserving addition of edges by many writers.

Distinct from foreign-key-as-relationship

In a typical relational model, a relationship is an FK column on one of the entities. This:

  • Mutates an entity to add a relationship.
  • Has implicit direction (FK from A to B; querying B → A requires an index).
  • Has no per-edge metadata (FK is just a value).
  • Doesn't survive entity deletion (CASCADE / SET NULL).

Reified edges decouple all of this: the edge is its own thing, queryable independently, queryable bidirectionally, with its own metadata, and surviving endpoint changes (subject to integrity rules).

When to use it

  • Graph is built incrementally by many independent processes (enrichment jobs, ETL pipelines, manual annotations).
  • Edges have meaningful metadata (provenance, confidence, derivation method).
  • Bidirectional traversal is a first-class requirement.
  • Schema evolution is frequent.
  • Time-travel queries / audit are required.

When not to use it

  • The graph is small enough that a relational FK schema suffices.
  • Query performance dominates over write flexibility — a native graph DB may be faster for pure traversal.
  • Edge metadata is minimal or non-existent.

Seen in

Last updated · 542 distilled / 1,571 read