Skip to content

PATTERN Cited by 1 source

Internal → external event pipeline

Summary

When a control-plane service needs to notify both internal subscribers (for its own state-machine progression) and external subscribers (downstream services in a different domain), split the event pipeline into two tiers: internal events for engine-level consumption, external events (after transformation) emitted to a public queue like Kafka / SNS.

Canonical wiki instance: Netflix Maestro's event publishing architecture (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator).

Problem

Control-plane services — workflow orchestrators, deployment systems, config managers — have two distinct classes of event consumer:

  1. Internal state machine — the engine's own retry / fan-out / rollup / aggregation logic needs to react to state changes. These events are high volume (every state transition of every workflow / instance / step), low latency (milliseconds), and tightly coupled to the engine's internal schema.
  2. External downstream systems — dashboards, alerting, compliance audits, downstream orchestrators, reporting pipelines. These consumers need a stable schema, care only about significant events (not every internal state-machine tick), and can tolerate seconds of delay.

Exposing the internal event schema directly to external consumers creates two problems:

  • Tight coupling — external consumers break every time the engine's internal state machine evolves.
  • Noise — external consumers see every micro-transition, burying the signal they care about.

Solution

Two queues, one processor:

  1. Internal queue — the engine publishes every lifecycle event here. Consumed by the engine itself + by the event processor.
  2. Event processor — consumes internal events, classifies them by type, transforms relevant ones into external-schema events.
  3. External queue (Kafka / SNS) — receives only the transformed external events. Downstream services consume from here with a stable contract.

Maestro's shape

"The Maestro event processor bridges the two aforementioned Maestro events. It listens on the internal queue to get the published internal events. Within the processor, the internal job event is processed based on its type and gets converted to an external event if needed. The notification publisher at the end emits the external event so that downstream services can consume." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)

Event classification

External events fall into two categories:

  • Workflow change events — definition / properties changed.
  • Instance status change events — workflow-instance or step-instance state transitions.

Structure

   Workflow / instance / step state changes
         ┌────────────────────────┐
         │ Internal event queue   │  (high-volume, engine-internal schema)
         └────────┬───────────────┘
       ┌──────────┼──────────────┐
       │          │              │
       ▼          ▼              ▼
  Engine     Retry /        Event processor
  scheduler  rollup         (classify + transform)
                     ┌──────────────────────────┐
                     │ External queue           │  (stable schema, lower volume)
                     │ (Kafka / SNS)            │
                     └────────┬─────────────────┘
                ┌─────────────┼─────────────┐
                ▼             ▼             ▼
          Dashboards    Alerting       Downstream
                                       orchestrators

Trade-offs

Axis Win Cost
Schema stability for external consumers Internal engine can evolve freely Event processor must be kept in sync with engine changes
Volume separation External subscribers don't see internal noise Two queues, two failure modes
Ownership Engine team owns internal events, platform team owns external contract Extra service (event processor) to run
Debuggability External event stream is auditable + stable Internal-event stream is engine-specific + harder to observe externally

When to use this pattern

  • Control-plane services with both internal + external event consumers — workflow orchestrators, deployment systems, config management, build systems.
  • Schema stability is required for external consumers while the internal state machine is evolving rapidly.
  • Event volume differs by orders of magnitude between internal + external needs.

When not to use

  • Single-consumer systems — if there's no separation between internal + external, one queue is simpler.
  • Low-volume systems — the double-queue architecture adds operational complexity that's only worth it at scale.
  • Real-time external consumers — the transformation step adds latency; systems needing <100ms external propagation should fan out directly.

Seen in

Last updated · 319 distilled / 1,201 read