Skip to content

PATTERN Cited by 1 source

Thin event plus source-of-truth hydration

Definition

The thin-event-plus-source-hydration pattern is an ingestion shape where:

  1. Producers emit minimal events containing only an entity identifier and event type — no state.
  2. Consumers receive the event, validate the schema, and call back to the source system's API to fetch the complete current state.
  3. Consumers transform the response into their own model and persist it locally.

The event stream is purely a change-notification trigger; authoritative state lives in source systems and is fetched on demand.

"The event stream becomes a notification of change rather than a log of changes."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

The Netflix MDS instance

Netflix MDS ingests from six source systems via Kafka + SNS/SQS using exactly this pattern.

Producer side:

{
  "event_type": "model_instance_created",
  "instance_id": "ranking-model-v5-20XX0101"
}

Consumer side, per-event-type hydration contract:

  1. Validate event schema.
  2. GET /api/v1/instances/{instance_id} against the Model Registry.
  3. Receive full descriptor:
    {
      "id": "ranking-model-v5-20XX0101",
      "pipeline_run_id": "train-weekly-ranking-20XX0101",
      "owner_emails": ["alice@netflix.com"],
      "labels": [{"key": "team", "value": "personalization"}],
      ...
    }
    
  4. Normalize to AIP URI form, persist to Datomic + Elasticsearch.

Each source system gets its own event handler; the consumer-side shape is uniform.

Structural property: order-independence

Because the event payload is content-free, replay/out-of-order delivery is naturally idempotent:

"This design has a crucial property: the order of events doesn't matter. MDS always fetches the latest facts from the source of truth. … If the event bus drops a message or delivers it out of order, the next event corrects the state."sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph

Whichever event runs last sees the latest source state. A dropped event is corrected by the next event for the same entity. This removes a large class of distributed-systems consistency bugs.

Implementation checklist

To implement this pattern correctly:

  • Each producer-source pair has a documented thin-event schema (entity identifier + event type + minimum metadata).
  • Each consumer-source pair has a documented hydration contract (which API to call, how to handle 404s, retries).
  • Rate limiting on the consumer side per source — "deliberate about rate limiting, caching, and backoff in our enrichment workers so that we don't overload them" (Netflix MDS).
  • Coalescing for hot entities — if multiple events for the same entity arrive in quick succession, hydrate once.
  • Caching at the consumer with a short TTL to absorb bursts.
  • Backoff + retry on source-side throttling (HTTP 429 / 503).
  • Bulk hydration if the source supports it.
  • 404 / "entity gone" handling — entity may have been deleted between event emission and hydration call; treat as retraction.

Distinct from CDC (change-data-capture)

Aspect CDC Thin event + hydration
Event payload Row data (before + after) Just an identifier
Tap point Database transaction log Application-emitted event
Source-side cost Log-based, low Producer emits + responds to hydration calls
Consumer-side source-API load Zero Read amplification
Consumer-source coupling Tight to source DB schema Loose — only to source API
Filtering Pipeline-side or consumer-side Source-side (in the API response)

CDC carries the data; hydration carries only a pointer. CDC is heavier on the wire but lighter on consumer-source coupling; hydration is lighter on the wire but heavier on consumer-source coupling at the API layer.

See concepts/change-data-capture.

Distinct from event sourcing

In event sourcing, the event log is the system of record. In this pattern, the event log is a trigger and the source-system API is the system of record. Replay semantics are inverted:

  • Event sourcing: replay reconstructs state by re-applying every event.
  • Thin event + hydration: replay re-fetches current state, ignoring history.

Event sourcing requires strict ordering; this pattern is order-independent.

When to use it

  • The source system has a well-defined query API capable of absorbing read load.
  • Producers are diverse / changing / can't be expected to know every consumer's schema needs.
  • The event bus may drop / reorder messages, and you want correctness despite that.
  • You are building a catalog / metadata service / search index that mirrors authoritative state from elsewhere.

When not to use it

  • The source system can't tolerate the read amplification.
  • You need exact change-deltas (financial reconciliation, audit trails).
  • Consumer needs millisecond-latency reaction to changes — hydration adds round-trip latency.
  • The source has no query API (only emits events).

Seen in

Last updated · 542 distilled / 1,571 read