Skip to content

CONCEPT Cited by 1 source

Entity URI namespace

Definition

An entity URI namespace is a catalog-design primitive that assigns every entity a globally-unique URI of the form:

<scheme>://<entity-type>/<source-system>/<source-id>

The URI encodes:

  • Routing — which source system to call for hydration.
  • Identity — a globally-unique handle that doesn't collide across source systems.
  • Type — the entity category, for filtering and indexing.

This is the precondition for cross-system graph traversal: foreign keys can point to URIs across source systems without ambiguity, and a single dispatcher can fan out hydration calls correctly.

The Netflix MDS instance

"The normalization process standardizes field names and formats. For example, platform-specific IDs become global AIP URIs, owner_emails becomes owners with resolved user URIs, and labels become tags. Foreign keys like pipeline_run_id are transformed into entity references."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Examples from the post (worked-example identifiers, illustrative):

  • aip://model/registry/ranking-model-v5-20XX0101
  • aip://pipeline-run/orchestrator/train-weekly-ranking-20XX0101
  • aip://user/identity/alice

Decomposition:

URI part Meaning
aip Scheme — "AI Platform" namespace owned by Netflix MDS
model / pipeline-run / user Entity type — drives indexing and UI dispatch
registry / orchestrator / identity Source system — drives hydration callback routing
ranking-model-v5-20XX0101 etc. Source-system-local identifier

Why a URI namespace beats raw IDs

A naive design would store source-system IDs directly. Problems:

  • Collisions. A model registry might issue ID 123 for a model; an experimentation platform might issue 123 for a test. A foreign key target_id: 123 is ambiguous.
  • No type information. "What is entity 123?" requires a separate catalog lookup or a schema-encoded type column.
  • No routing information. When a foreign key points to another entity, the consumer needs out-of-band knowledge of which source system to call to hydrate it.

A URI bundles all three into a single string:

aip://model/registry/123
^^^   ^^^^^ ^^^^^^^^ ^^^
|     |     |        |
|     |     |        +-- source-system-local ID (no global uniqueness needed)
|     |     +----------- routing key (which API to call)
|     +----------------- type (for filter / dispatch)
+----------------------- namespace owner

A foreign key written as a URI is self-describing — downstream consumers can resolve it without additional context.

Why this enables cross-system graph traversal

Without normalization to URIs:

"Without normalization, downstream consumers would need to understand every source system's schema."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Each consumer would have to know:

  • That pipeline_run_id from Model Registry refers to the orchestrator's namespace.
  • That owner_emails from Model Registry are emails but owner_id from Pipeline Orchestrator are numeric LDAP IDs.
  • That A/B test cells are referenced by (test_id, cell_number) in some sources but by a flat cell_id in others.

Normalizing to URIs makes all of this uniform. A relationship edge (source_uri, edge_type, target_uri) is a complete fact without further context.

Generalizes to non-MDS catalogs

The pattern recurs across catalog / metadata / lineage systems:

  • AWS resource ARNsarn:aws:s3:::my-bucket/object-key — same shape: scheme, service, region, account, resource.
  • Kubernetes resource referencesapiVersion + kind + namespace + name — encoded across multiple fields but semantically equivalent.
  • Datadog / OpenTelemetry resource attributesservice.name, cloud.provider, etc. — flatter than a URI but same information.
  • Spotify Backstage entity refs<kind>:<namespace>/<name> — near-identical shape.

The URI form (vs. multi-field structured ID) has one extra property: it's a string, so it can be used directly as a key in any KV store, a token in a search index, or a foreign key column in any DB. Multi-field structured IDs require encoding to a string at every interop point.

Design considerations

When designing such a namespace:

  • Stable scheme. Don't rename your scheme later — every reference becomes wrong.
  • Stable source-system labels. Same constraint, scoped to one level deeper.
  • Source-system-local IDs need not be globally unique. The URI scopes them.
  • Don't encode mutable state. Don't put the model version number or environment tag in the URI; those should be attributes, not identity.
  • Reserve a generic unknown source-system bucket for entities that pre-exist a proper source-system integration.

Seen in

Last updated · 542 distilled / 1,571 read