CONCEPT Cited by 1 source

Reified edge graph¶

Definition¶

A reified edge graph is a data-modeling pattern where relationships between entities are stored as first-class records (facts) rather than as graph-DB-native edges or as values embedded in entity attributes. Each edge is its own queryable entity with provenance, timestamps, and the ability to be appended-to without rewriting either endpoint.

The term reified here means: the relationship is given a material existence as data, rather than being an implicit property of the participating entities.

The Netflix MDS instance¶

"What we store: All entity attributes as facts. Entity references (foreign keys that may point to entities not yet fully resolved). All relationships as reified edges (added by enrichment processes). Entity lifecycle state (tracking which entities are fully enriched vs awaiting hydration)." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Netflix MDS uses Datomic's immutable fact model as the substrate. Each relationship — Model Instance ↔ A/B Test, Feature ↔ Consuming Model, Dataset ↔ Downstream Pipeline — is stored as a fact (or set of facts) that can be:

Added without modifying either endpoint entity.
Annotated with provenance (which enrichment job derived it, when, what source-system response justified it).
Queried from either direction.
Retracted without losing history (Datomic appends a retraction fact; the original remains in time-travel queries).

Why reify?¶

1. Continuous edge addition without endpoint mutation¶

"Its immutable fact model means we can continuously add relationships without losing the original entity state." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

In a typical relational schema, adding a relationship requires updating one or both endpoint rows. With reified edges, the relationship is its own row; endpoints stay untouched. This matters because the enrichment process is concurrent — multiple background jobs may be deriving edges simultaneously, and a row-level write contention on entities would serialize them.

2. Bidirectional queryability¶

A reified edge is symmetric in storage: it has both endpoints as attributes, so the query engine can index from either side. This is the structural reason Netflix MDS supports both "Which A/B tests use this model?" and "What models are being tested in experiment 12345?" with equal efficiency (concepts/multi-hop-relationship-materialization).

3. Provenance and audit¶

Each reified edge carries metadata:

Which enrichment job created it.
Which source-system API call justified it.
When it was created.
Whether it has been retracted (and by what subsequent fact).

This is invaluable for debugging incorrect lineage. "Why does the graph think this model is connected to that test?" can be answered by inspecting the edge's provenance, not by re-running the enrichment manually.

4. Schema evolution¶

"Flexible schema evolution: Easy to add new entity types and attributes as the catalog grows." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

A new edge type is just a new attribute name. No schema migration, no ALTER TABLE, no breaking changes for existing edges.

Distinct from native graph-DB edges¶

Aspect	Native graph edge (e.g. Neo4j relationship)	Reified edge
Storage form	Built-in primitive in storage engine	Just another fact / record
Query language	First-class graph-traversal syntax	Datalog-style fact lookup
Provenance	Edge properties (limited)	Full fact-level history with transactions
Concurrent addition	Edge-level locks possible	No locks; append-only facts
Time-travel	Engine-dependent	Native (Datomic)
Schema for edge metadata	Property-bag	Same as any other entity — full attribute set

Native graph DBs are optimized for traversal; reified edges in an immutable-fact store are optimized for continuous, concurrent, provenance-preserving addition of edges by many writers.

Distinct from foreign-key-as-relationship¶

In a typical relational model, a relationship is an FK column on one of the entities. This:

Mutates an entity to add a relationship.
Has implicit direction (FK from A to B; querying B → A requires an index).
Has no per-edge metadata (FK is just a value).
Doesn't survive entity deletion (CASCADE / SET NULL).

Reified edges decouple all of this: the edge is its own thing, queryable independently, queryable bidirectionally, with its own metadata, and surviving endpoint changes (subject to integrity rules).

When to use it¶

Graph is built incrementally by many independent processes (enrichment jobs, ETL pipelines, manual annotations).
Edges have meaningful metadata (provenance, confidence, derivation method).
Bidirectional traversal is a first-class requirement.
Schema evolution is frequent.
Time-travel queries / audit are required.

When not to use it¶

The graph is small enough that a relational FK schema suffices.
Query performance dominates over write flexibility — a native graph DB may be faster for pure traversal.
Edge metadata is minimal or non-existent.

Seen in¶

sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS stores all relationships as reified edges in Datomic, enabling continuous async enrichment without entity mutation.