Skip to content

PATTERN Cited by 1 source

Async graph enrichment job

Definition

The async-graph-enrichment-job pattern is a scheduled background-job loop that discovers, materializes, and re-indexes cross-entity relationships in a metadata graph. It walks multi-hop paths in the background, writes derived edges back as new facts, and triggers search re-indexing — without blocking the real-time ingestion path.

"This asynchronous approach allows MDS to handle the computational cost of graph formation without blocking real-time event ingestion. It also enables retry logic and gradual enrichment as new entities become available."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Canonical workflow

From Netflix MDS:

  1. Identify candidates — find entities marked uncached or with unresolved references.
  2. Hydrate relationships — query source-of-truth systems to fetch related entity details (concepts/source-of-truth-hydration).
  3. Materialize edges — write discovered relationships back to the fact store.
  4. Re-index — trigger search index updates for affected entities.
  5. Mark as enriched — update entity status to prevent redundant processing.

The Netflix MDS instance

Netflix MDS uses this pattern to derive cross-system relationships between entities in six source systems (Pipeline Orchestration, Model Registry, Feature Store, Experimentation Platform, Datasets, Identity Platform).

Worked example: connecting a model instance to its A/B tests via a 3-hop walk:

Step 1: Hydrate model instance
        → discovers `pipeline_run_id`
Step 2: Hydrate pipeline run
        → discovers `ab_test_cells`
Step 3: Hydrate A/B test
        → resolves test details
Result: Write back direct edge `Model Instance ↔ A/B Test`
        Trigger Elasticsearch re-index
        Mark all four entities as enriched

"MDS doesn't just store what it's told; it derives new knowledge by walking the graph in the background."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Why async

Three structural reasons sync would be wrong:

  1. Unbounded fan-out per ingestion event. A new model instance may transitively reference dozens of other entities; hydrating all of them synchronously would block ingestion arbitrarily.
  2. Forward references. A model instance may arrive before its pipeline run does. The async loop will pick up the relationship on a later cycle when the pipeline run arrives.
  3. Retry locality. A source-system outage shouldn't kill ingestion. The async loop retries on the next cycle.

Implementation primitives

To run this pattern at scale:

  • Mark-and-sweep state machine. Each entity has a state: uncached / partially-resolved / fully-enriched. The job scans for non-fully-enriched entities; the marking is the retry mechanism.
  • Rate-limited hydration. Same constraints as patterns/thin-event-plus-source-hydration — bounded calls per source per second.
  • Idempotent edge writes. Writing an edge that already exists must be a no-op — important for handling enrichment job retries.
  • Per-entity-type enrichment policies. Different entity types have different multi-hop paths to materialize.
  • Last-enriched timestamp persisted on each entity, surfaced to consumers (see staleness budget below).
  • Batch coordination. Multiple parallel enrichment jobs must not duplicate work — typically via a claim-and-process queue.

Staleness is a first-class field

"Because enrichment is asynchronous, newly discovered relationships may appear with a short delay after the underlying entities are created (typically minutes rather than seconds). We track when each entity was last enriched and surface this timestamp in the AIP Portal, so practitioners can reason about staleness."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

The honesty about staleness is a key part of the pattern. Hide the lag and users misuse stale data; surface it and they reason about it correctly.

Distinct from sync graph traversal

Aspect Sync graph walk on read Async enrichment + materialized edges
Read latency High (multi-hop walk + source API calls) Low (single edge lookup)
Read consistency Always fresh Up to N minutes stale
Write path latency Just persist entity Just persist entity
Background cost None Scheduled job continuously running
Best for Small graphs, infrequent multi-hop queries Large graphs, frequent multi-hop queries

Distinct from CDC pipelines

CDC propagates the same data in two places. Async enrichment derives new facts (edges) from existing data. CDC is plumbing; this pattern is a graph-level abstraction above it.

Open challenges (per Netflix MDS)

  • Metadata quality. "Today, MDS ensures data consistency through source-of-truth hydration and schema validation at ingestion. Background enrichment jobs continuously infer relationships and materialize entities from source systems. However, challenges remain in ensuring completeness and timeliness at scale."
  • Advanced relationship inference. "Beyond explicit relationships declared in source systems, how do we infer implicit connections? Can we detect that two models serve similar purposes based on shared features?" — i.e., extend the enrichment job from deterministic-walk-derived edges to learned / probabilistic edges. "We are in the early stages of exploring these ideas."

Seen in

Last updated · 542 distilled / 1,571 read