Skip to content

SYSTEM Cited by 1 source

Datomic

Datomic is a transactional database with an immutable fact model — every transaction appends new facts; existing facts are never overwritten. This produces a queryable history (time-travel queries) and a graph-native data model where relationships can be modeled as first-class facts. Originally created by Rich Hickey (Clojure) and Cognitect, now part of Nubank.

This page is a stub created for cross-referencing from sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph, where Datomic is the storage substrate for Netflix's MDS model lifecycle graph.

Core data model: facts

Datomic stores data as facts of the form [entity, attribute, value, transaction, op]. New facts are appended; retraction is a new fact stating the old fact no longer holds, but the original fact remains in history. This produces:

  • Time-travel queries — query the database as-of any past point.
  • Audit trail by construction — every change is a fact with a transaction reference.
  • Schema evolution without migration — adding new attributes is just adding facts; no ALTER TABLE.

Why Netflix MDS chose Datomic

"Datomic serves as both the system of record for MDS and the working dataset for enrichment processes. Its immutable fact model means we can continuously add relationships without losing the original entity state."

sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

The load-bearing property: enrichment jobs continuously append new edges as they walk multi-hop paths in the graph (concepts/multi-hop-relationship-materialization). On a mutable store this would require careful ordering of read-modify-write transactions to avoid losing concurrent writes. On Datomic's append-only fact model, every new edge is just a new fact; concurrent enrichment jobs can append independently without coordination.

What MDS stores in Datomic:

  • All entity attributes as facts.
  • Entity references (foreign keys, possibly to entities not yet fully resolved).
  • All relationships as reified edges added by enrichment processes. See concepts/reified-edge-graph.
  • Entity lifecycle state (uncached / partially-resolved / fully-enriched).

What this enables for MDS:

  • Complex graph traversals"Navigate from a model to its features to their data sources in a single query."
  • Entity relationships"Join across multiple domains without N+1 query problems."
  • Flexible schema evolution"Easy to add new entity types and attributes as the catalog grows."
  • Progressive enrichment"Background jobs efficiently identify and process entities requiring additional hydration, enabling gradual graph completion without reprocessing fully enriched entities."

Use shape: graph traversal

"In practice, we use Datomic for relationship-heavy, navigational queries such as: Starting from this model instance, show me all upstream datasets and downstream experiments. Given this feature, list all consuming models and their owning teams. These queries often span multiple hops in the graph and benefit from Datomic's immutable fact model and efficient joins across entity relationships."

sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Datomic's query language is Datalog, which natively expresses recursive graph walks (transitive closure over a relationship edge). This is structurally a better fit for "walk the graph" than SQL recursive CTEs and avoids the N+1 query antipattern of an ORM-driven walk.

Seen in

Caveats

  • This wiki page is a stub limited to the role Datomic plays in the Netflix MDS post. Datomic has many capabilities (time-travel queries, peer architecture, ION cloud deployment) not covered here.
  • The post does not disclose Datomic deployment shape (Pro? Cloud? custom-hosted?), cluster topology, or capacity numbers.
Last updated · 542 distilled / 1,571 read