Skip to content

SYSTEM Cited by 1 source

Netflix Sphere (self-service reporting)

Sphere is Netflix's self-service operational reporting tool for business users, and the second named production consumer of UDA (Source: sources/2025-06-14-netflix-model-once-represent-everywhere-uda).

What Sphere does

From the Netflix post: "Sphere uses UDA to catalog and relate business concepts across systems, enabling discovery through familiar terms like 'actor' or 'movie.' Once concepts are selected, Sphere walks the knowledge graph and generates SQL queries to retrieve data from the warehouse, no manual joins or technical mediation required."

The user workflow:

  1. Discover business concepts by searching UDA's knowledge graph in familiar terms (actor, movie, asset, …) — domain-model vocabulary, not warehouse-table vocabulary.
  2. Select the concepts to report on.
  3. Sphere walks the graph from each selected concept node to its underlying data container in the warehouse — traversing mapping edges that UDA maintains between concepts and containers.
  4. Sphere generates SQL from that walk — the path encodes the JOIN keys; the container schemas encode the tables + columns; the result is an executable query over the warehouse (Iceberg / Spark / Presto, depending on target).
  5. Sphere runs the query and returns results — "no manual joins or technical mediation required."

Graph walk → SQL is the load-bearing primitive

Sphere canonicalises patterns/graph-walk-sql-generation on the wiki: the graph path is the JOIN specification. Because UDA maintains mappings from every domain concept to every data container that holds its instance data, and because those mappings form a connected graph with the knowledge graph itself, any two concepts that can be connected by a graph walk can be joined in SQL — without the human having to know which tables, which columns, or which join keys are involved.

This is the operational answer to the four UDA pain points (duplicated models, inconsistent terminology, data-quality issues, limited connectivity) at the analytics layer:

  • Business users query in business vocabulary.
  • JOIN correctness is a consequence of graph correctness.
  • Connectivity between systems is an automatic consequence of the mapping graph being connected.

Why this belongs to the UDA thesis

Sphere embodies patterns/model-once-represent-everywhere applied to analytics: one domain model, one mapping graph, one concept vocabulary → unbounded ad-hoc reporting queries without per-report pipeline engineering. The analyst writes concept selections; UDA writes the SQL.

Sphere's place in the performance discussion

The UDA post doesn't position it this way, but Sphere implicitly sidesteps a classic RDF-scale concern: SPARQL over triple-stores has historically under-performed tabular queries at warehouse scale. Sphere's model is to use the knowledge graph for compilation — walk the graph to produce a SQL query — then run the SQL against the warehouse natively. The knowledge graph is the planner; the warehouse is the executor. This is a design response to the scale worry, though the post doesn't frame it that way.

Caveats

  • Architecture-overview voice. No user count, no QPS, no generated-SQL complexity statistics, no cost comparison to hand-written analytics queries.
  • Warehouse engine(s) not named. The post says "the warehouse" — presumably Iceberg-on-S3 + Spark or Presto per other Netflix posts, but Sphere doesn't pin this down.
  • Graph-walk algorithm not described. Multi-path disambiguation, ambiguous-join resolution, and the UI affordances for surfacing alternative walks are not covered.
  • Column-level exposure policy not discussed. Sensitive attributes, row-level security, and access control on graph edges are not mentioned.
  • SQL target dialect(s) not named.

Seen in

Last updated · 319 distilled / 1,201 read