PATTERN Cited by 1 source
Thin event plus source-of-truth hydration¶
Definition¶
The thin-event-plus-source-hydration pattern is an ingestion shape where:
- Producers emit minimal events containing only an entity identifier and event type — no state.
- Consumers receive the event, validate the schema, and call back to the source system's API to fetch the complete current state.
- Consumers transform the response into their own model and persist it locally.
The event stream is purely a change-notification trigger; authoritative state lives in source systems and is fetched on demand.
"The event stream becomes a notification of change rather than a log of changes." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph
The Netflix MDS instance¶
Netflix MDS ingests from six source systems via Kafka + SNS/SQS using exactly this pattern.
Producer side:
Consumer side, per-event-type hydration contract:
- Validate event schema.
GET /api/v1/instances/{instance_id}against the Model Registry.- Receive full descriptor:
- Normalize to AIP URI form, persist to Datomic + Elasticsearch.
Each source system gets its own event handler; the consumer-side shape is uniform.
Structural property: order-independence¶
Because the event payload is content-free, replay/out-of-order delivery is naturally idempotent:
"This design has a crucial property: the order of events doesn't matter. MDS always fetches the latest facts from the source of truth. … If the event bus drops a message or delivers it out of order, the next event corrects the state." — sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph
Whichever event runs last sees the latest source state. A dropped event is corrected by the next event for the same entity. This removes a large class of distributed-systems consistency bugs.
Implementation checklist¶
To implement this pattern correctly:
- Each producer-source pair has a documented thin-event schema (entity identifier + event type + minimum metadata).
- Each consumer-source pair has a documented hydration contract (which API to call, how to handle 404s, retries).
- Rate limiting on the consumer side per source — "deliberate about rate limiting, caching, and backoff in our enrichment workers so that we don't overload them" (Netflix MDS).
- Coalescing for hot entities — if multiple events for the same entity arrive in quick succession, hydrate once.
- Caching at the consumer with a short TTL to absorb bursts.
- Backoff + retry on source-side throttling (HTTP 429 / 503).
- Bulk hydration if the source supports it.
- 404 / "entity gone" handling — entity may have been deleted between event emission and hydration call; treat as retraction.
Distinct from CDC (change-data-capture)¶
| Aspect | CDC | Thin event + hydration |
|---|---|---|
| Event payload | Row data (before + after) | Just an identifier |
| Tap point | Database transaction log | Application-emitted event |
| Source-side cost | Log-based, low | Producer emits + responds to hydration calls |
| Consumer-side source-API load | Zero | Read amplification |
| Consumer-source coupling | Tight to source DB schema | Loose — only to source API |
| Filtering | Pipeline-side or consumer-side | Source-side (in the API response) |
CDC carries the data; hydration carries only a pointer. CDC is heavier on the wire but lighter on consumer-source coupling; hydration is lighter on the wire but heavier on consumer-source coupling at the API layer.
See concepts/change-data-capture.
Distinct from event sourcing¶
In event sourcing, the event log is the system of record. In this pattern, the event log is a trigger and the source-system API is the system of record. Replay semantics are inverted:
- Event sourcing: replay reconstructs state by re-applying every event.
- Thin event + hydration: replay re-fetches current state, ignoring history.
Event sourcing requires strict ordering; this pattern is order-independent.
When to use it¶
- The source system has a well-defined query API capable of absorbing read load.
- Producers are diverse / changing / can't be expected to know every consumer's schema needs.
- The event bus may drop / reorder messages, and you want correctness despite that.
- You are building a catalog / metadata service / search index that mirrors authoritative state from elsewhere.
When not to use it¶
- The source system can't tolerate the read amplification.
- You need exact change-deltas (financial reconciliation, audit trails).
- Consumer needs millisecond-latency reaction to changes — hydration adds round-trip latency.
- The source has no query API (only emits events).
Related patterns¶
- patterns/async-graph-enrichment-job — what to do after hydration to derive cross-entity relationships.
- patterns/dual-store-graph-plus-search-index — typical storage shape for the consumer in this pattern.
- patterns/transactional-outbox — producer-side pattern for guaranteeing the thin event gets emitted with the state change.
Seen in¶
- sources/2026-05-04-netflix-democratizing-machine-learning-building-the-model-lifecycle-graph — Netflix MDS canonicalizes the pattern for ML metadata ingestion across six source systems.