CONCEPT Cited by 1 source
Enrichment execution engine¶
Definition¶
An enrichment execution engine is the shared runtime substrate that turns raw events into enriched records based on configuration — "reading configuration, connecting to data sources (kafka, logs, tables, feature stores), running filtering and featurization, calling enrichment services or joining against offline tables, and finally writing enriched results to storage" (Source: sources/2026-05-21-pinterest-making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use).
It composes two distinct layers:
- The engine / framework — owns IO substrate, concurrency, retries, backpressure, configuration parsing + validation, sources + sinks. Cross-cutting concerns identical for every event type.
- The executor / plugin — owns business-specific filtering, featurisation, and the raw → normalised mapping for a particular event type or grouping. "In plain terms, the executor is the 'business logic module' for a particular event type or grouping, while the execution engine handles everything around it."
This is the architectural primitive that makes one definition, many runtimes mechanically realisable: the same engine + the same executors run in both streaming and batch jobs, so the "shared definition" compiles to literally the same execution logic across runtimes — only the scheduling shape differs.
Why the framework / plugin separation¶
Three distinct teams operate at three distinct lifecycles:
| Layer | Owner | Cadence | Concerns |
|---|---|---|---|
| Framework | Platform team | Quarterly / yearly | Throughput, reliability, multi-runtime parity, observability |
| Executor | ML / product team | Per event-type / per signal | Filter logic, enrichment shape, feature semantics |
| Configuration | ML / product team | Per launch | Which sources, which filters, which enrichments wire up |
Without the separation, every event-type change becomes a pipeline-stack change; every framework change risks breaking individual event-type semantics. Pinterest names this explicitly: "To keep the system maintainable, we drew a clear line between framework and plugin code."
Pluggable-executor contract¶
A typical executor consumes one event-shape and produces one or more enriched records. The minimal contract:
- Input: raw event from a source connector (Kafka topic, warehouse table row, log archive entry).
- Filter: predicate(s) deciding whether this raw event should produce any output.
- Featurise: extract / compute the per-event payload.
- Enrich: dispatch to enrichment services or join against offline tables.
- Output: normalised user-event record(s) ready for assembly.
The executor is intentionally scoped to per-event-type business logic rather than per-pipeline orchestration. If the executor needs to know about retries, source partitioning, or sink batching, the framework / plugin boundary has been violated.
Streaming + batch from the same code¶
The engine's payoff:
"The shared engine allowed us to reuse the same core enrichment logic in both streaming jobs that handle near-real-time events and batch jobs that process historical data. That minimized code duplication and reduced drift between batch and real-time behavior."
This is the structural cure for streaming-vs-batch drift, the historical Achilles heel of lambda architecture. Classical lambda required maintaining two code paths; the enrichment execution engine collapses that to two scheduling shapes of one code path.
What lives where¶
┌───────────────────────────────────────────────────────────┐
│ Engine / framework (platform team owns) │
│ ───────────────────────────────────────── │
│ • source connectors (Kafka, warehouse, log archive, FS) │
│ • sink writers (columnar storage, online store) │
│ • concurrency, parallelism, sharding │
│ • retries, idempotency, backpressure │
│ • config parser + validator │
│ • observability (metrics, lag, freshness, errors) │
│ • multi-runtime parity (streaming + batch + serving) │
└───────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ Executors / plugins (ML / product team owns) │
│ ───────────────────────────────────────── │
│ • per-event-type filter predicates │
│ • per-event-type featurisation │
│ • per-event-type raw → normalised mapping │
│ • per-event-type enrichment dispatch │
└───────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────┐
│ Configuration (ML / product team owns) │
│ ───────────────────────────────────────── │
│ • which sources │
│ • which executors │
│ • which enrichments │
│ • output schema + retention │
└───────────────────────────────────────────────────────────┘
Application: Pinterest user-sequence platform¶
Pinterest's user-sequence platform uses an enrichment execution engine to power three runtimes (real-time indexer, batch indexer + backfill, online serving API) from one definition surface. All event types — across teams, models, and product surfaces — flow through the same engine + their own executors.
Sibling architectures¶
- Spark / Flink as substrate — neither is itself an enrichment execution engine in this sense; both are the IO + concurrency runtime layer. An enrichment execution engine can be implemented on top of Spark or Flink, with the executor abstraction added as the per-event-type plugin layer.
- Feature stores — typically own materialisation, not the enrichment logic. An enrichment execution engine sits upstream and feeds a feature store (or directly produces feature-store-shaped tables).
- dbt / SQL pipelines — declarative-SQL substrates where the "definition" is the SQL itself. They cover batch but not streaming + serving from the same definition; the enrichment execution engine pattern targets the multi-runtime case.
Caveats¶
- The framework / plugin split is easy to violate. Pinterest names it explicitly because it's the gate that has to be defended each time a new requirement lands. Framework features that leak event-type knowledge make every event-type change harder; executors that reach into IO substrate make framework upgrades risky.
- Executor uniformity is hard to enforce. Different event types may need slightly different output shapes / ordering guarantees / cardinality contracts. The engine has to expose just enough of these as configuration without re-creating the framework / plugin boundary leak.
- Observability is engine-owned but per-executor. Lag, throughput, error rates need to be tagged with executor identity to be useful — generic engine metrics aren't enough for tenant-level debugging.
Seen in¶
- sources/2026-05-21-pinterest-making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use — first canonical wiki disclosure with the framework / executor / configuration triangle named explicitly.