Skip to content

CONCEPT Cited by 1 source

Data container

Definition — UDA's framing

A data container is a named repository of instance data that conforms to a specific schema language / type system and operates within a specific system's structural and operational constraints. From the Netflix UDA post (sources/2025-06-14-netflix-model-once-represent-everywhere-uda):

"Data containers are repositories of information. They contain instance data that conform to their own schema languages or type systems: federated entities from GraphQL services, Avro records from Data Mesh sources, rows from Iceberg tables, or objects from Java APIs. Each container operates within the context of a system that imposes its own structural and operational constraints."

Named examples from the post:

Why UDA makes "data container" a first-class concept

The central problem UDA is solving — "define a model once, represent it everywhere" — requires a uniform abstraction for the many heterogeneous places instance data actually lives. A data container is that abstraction. From the post:

"Data container representations are data. They are faithful interpretations of the members of data systems as graph data. UDA captures the definition of these systems as their own domain models, the system domains. These models encode both the information architecture of the systems and the schemas of the data containers within. They provide a blueprint for translating the systems into graph representations."

In UDA-speak, a data container is:

  1. A target for domain-model projection — mappings in the knowledge graph connect each domain concept to one or more data containers.
  2. A source for graph representation — every container has a system-domain model describing its schema + constraints; UDA can then project each container's contents into the graph.
  3. A pipeline endpoint — UDA-provisioned pipelines move data from container to container (GraphQL entities → Data Mesh → Iceberg data products).

Containers vs tables / schemas / services

Data container is deliberately broader than "table" or "schema" or "service":

  • A table is one kind of container (rows in Iceberg / SQL).
  • A GraphQL type is another (federated entities).
  • An Avro record is another (Data Mesh).
  • A Java API object is another (typed in-memory values).

The unifying property: each is "a repository of instance data that conforms to a schema language." UDA levels the playing field so mappings and pipelines can treat them uniformly.

Seen in

Last updated · 319 distilled / 1,201 read