Skip to content

SYSTEM Cited by 1 source

Netflix UDA (Unified Data Architecture)

UDAUnified Data Architecture — is Netflix Content Engineering's in-house knowledge graph platform that connects business concepts (actor, movie, asset, …) to the many data systems where instance data actually lives (Enterprise GraphQL Gateway, Domain Graph Services, Data Mesh sources, Iceberg tables, Java APIs), and projects each domain model outward into concrete schemas — GraphQL, Avro, SQL, RDF, Java — via a transpiler family (Source: sources/2025-06-14-netflix-model-once-represent-everywhere-uda).

Thesis

"Define a model once, at the conceptual level, and reuse those definitions everywhere … project those definitions outward, generating schemas and enforcing consistency across systems. The conceptual model must become part of the control plane." — this is UDA's headline pattern, patterns/model-once-represent-everywhere — the conceptual model is promoted from documentation artifact to control-plane artifact (concepts/control-plane-data-plane-separation).

Problem shape

Netflix's offerings (films / series / games / live events / ads) have multiplied the systems that encode the same core concepts in parallel, with no coordination. The article enumerates four pain points:

  1. Duplicated and inconsistent models — same entity, conflicting definitions across systems.
  2. Inconsistent terminology — even within one system, the same term means different things to different teams.
  3. Data quality issues — identifiers and foreign keys exist but are "inconsistently modeled and poorly documented, requiring manual work from domain experts to find and fix any data issues."
  4. Limited connectivity — within a system, relationships are constrained by what the system supports; across systems they are "effectively non-existent."

Architecture

Capabilities exposed to users and systems

  1. Register and connect domain models — formal conceptualisations of federated business domains, expressed as data (concepts/domain-model).
  2. Catalog + map domain models to data containers — (concepts/data-container): GraphQL type resolvers served by a Domain Graph Service, Data Mesh sources, Iceberg tables.
  3. Transpile domain models — into GraphQL, Avro, SQL, RDF, and Java schema definitions, preserving semantics (patterns/schema-transpilation-from-domain-model).
  4. Move data faithfully between data containers — e.g. federated GraphQL entities → Data Mesh; CDC sources → joinable Iceberg data products. Pipelines auto-provisioned from domain model + mappings, not hand-plumbed.
  5. Discover + explore domain concepts via search and graph traversal.
  6. Programmatically introspect via Java (generated from Upper), GraphQL (federated), or SPARQL (raw graph).

Foundations

  • RDF + SHACL chosen — "Connecting business concepts to schemas and data containers in a graph-like structure, grounded in strong semantic foundations, naturally led us to consider a knowledge graph approach" — but gaps enumerated:
  • RDF lacks a usable information model (no resolution over named graphs).
  • SHACL is built for native-RDF validation, not enterprise data structured around local schemas + typed keys.
  • Ontology tooling lacks modular authoring / team ownership / safe federation primitives (unlike GraphQL Federation).
  • Teams lacked shared authoring practice; style drift broke transpilation consistency.

  • Named-graph-first info model. UDA's response: "Each named graph conforms to a governing model, itself a named graph in the knowledge graph." This systematic nesting buys resolution, modularity, and governance across the entire graph (concepts/named-graph).

  • Upper — the metamodel. UDA bootstraps from Upper, a self-referencing upper ontology that models itself, defines the concept of a domain model, and conforms to its own model (patterns/self-referencing-metamodel-bootstrap). "All domain models are conservative extensions of Upper."

  • System domains are first-class. UDA captures the definition of data systems (GraphQL, Avro, Data Mesh, Mappings) as their own domain models. The same Upper language describes "what a movie is" and "what a Data Mesh source is" — business + system domain duality.

Runtime surface

Upper itself is projected into a generated Jena-based Java API + a GraphQL schema federated into Netflix's Enterprise GraphQL Gateway (systems/netflix-enterprise-graphql-gateway). The transpiler family and UI both consume the same generated APIs — UDA's runtime is, by design, "the metamodel's first customer".

Named production consumers

PDM — Primary Data Management

systems/netflix-pdm — Netflix's platform for authoritative reference data + taxonomies. "PDM turns domain models into flat or hierarchical taxonomies that drive a generated UI for business users. These taxonomy models are projected into Avro and GraphQL schemas, automatically provisioning data products in the Warehouse and GraphQL APIs in the Enterprise Gateway." One domain model drives: the authoring UI, the warehouse Avro schema, and the Enterprise Gateway GraphQL type.

Sphere — self-service operational reporting

systems/netflix-sphere"Sphere uses UDA to catalog and relate business concepts across systems, enabling discovery through familiar terms like 'actor' or 'movie.' Once concepts are selected, Sphere walks the knowledge graph and generates SQL queries to retrieve data from the warehouse, no manual joins or technical mediation required." Canonical wiki instance of patterns/graph-walk-sql-generation — the path between concept nodes in the knowledge graph encodes the JOIN the human would otherwise have to write by hand.

Worked example

The article's worked domain is onepiece: (the manga) — a non-Netflix example chosen for clarity. Characters are related to Devil Fruit; each Devil Fruit has a type. The Turtle definition is open: onepiece.ttl in the open netflix-skunkworks/uda repo.

Why "knowledge graph" here

UDA is explicitly "a data catalog unified with a schema registry" (concepts/schema-registry) with a hard requirement for semantic integration (concepts/semantic-interoperability). The wiki already documents knowledge graph under Dropbox Dash's framing — a retrieval relevance substrate for agents. Netflix UDA is the canonical wiki instance of the second framing: enterprise-data-integration substrate for schemas + pipelines. Same data structure, different load.

Caveats

See source page for full caveat list. Headline limits:

  • Architecture-overview voice — no fleet size, no graph cardinality, no QPS or query-latency numbers, no transpiler compile-time data, no SPARQL throughput.
  • Transpiler internals undisclosed — target languages named and "semantics preserved" asserted, but pipeline mechanics + conflict handling + versioning under domain-model evolution not described.
  • Named-graph resolution mechanism unnamed — post criticises owl:imports but doesn't name UDA's replacement.
  • Governance / ownership primitives not defined — flagged as gaps SHACL + RDF don't cover, but UDA's answer is not shown.
  • Mappings model underspecified — Mappings is named as a system domain but its Upper-extension shape is not shown.
  • Scale / adoption numbers absent — onboarded domains, PDM taxonomies shipped, Sphere usage, all undisclosed.
  • First post in a series — information infrastructure, UI, pipelines, and production-scale detail deferred to later posts.

Seen in

Last updated · 319 distilled / 1,201 read