Skip to content

PATTERN Cited by 2 sources

Signal-based publish/subscribe step triggering

Summary

In a workflow orchestrator, unify inter-workflow triggering and intra-workflow step dependency behind a single signal primitive. Signals are messages carrying parameter values, produced either by step outputs or external systems, that unblock subscribing steps / start subscribing workflows when their match criteria are met.

Canonical wiki instance: Netflix Maestro's signal mechanism (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator).

Problem

Workflow orchestrators historically have split mechanisms for triggering:

  • Intra-workflow dependency: DAG edges, conditional-branch predicates.
  • Inter-workflow triggering from internal producers: Airflow ExternalTaskSensor, Step Functions invoking other state machines.
  • Inter-workflow triggering from external producers: webhook handlers, SNS subscribers, Kafka consumers written as custom glue.

Each mechanism has its own semantics (at-most-once vs at-least-once vs exactly-once), its own failure modes, its own audit trail, and its own configuration surface. Cross-team integration typically requires wiring two+ of these together.

Solution

One signal primitive, with three properties:

  1. Dual semantics — the same signal serves both pub-sub (one producer unblocks many consumers) and trigger (external event starts a workflow).
  2. Parameter-carrying matching — signals carry parameter values; subscribers specify mapped-field subsets + operators (<, >, =, etc.) for matching.
  3. Exactly-once per match — subscribers run exactly once per matching signal or joined-signal set, regardless of substrate delivery guarantees.

Source coverage

Signals can be produced by:

This closes the producer-origin gap — the same primitive handles within-workflow, cross-workflow, and cross-system triggering uniformly.

Signal lineage

"Maestro supports 'signal lineage,' which allows users to navigate all historical instances of signals and the workflow steps that match (i.e. publishing or consuming) those signals."

The orchestrator persists a queryable history of producer / consumer / match events — essential for debugging ("why didn't my downstream trigger?") + audit ("what flows consumed this table update?").

Exactly-once guarantee

"Signal triggering guarantees exactly-once execution for the workflow subscribing a signal or a set of joined signals."

See concepts/exactly-once-signal-trigger for the architectural shape — orchestrator-level dedup state over an at-least-once substrate.

When to use

  • Multi-team workflow integration — signals become the organisational protocol without N² wiring.
  • Data-pipeline trigger — ETL workflow produces a signal with the partition key; downstream ML/reporting workflows subscribe to specific partitions.
  • Event-driven ML retraining — new data signals trigger retraining workflows conditional on sufficient accumulated data.
  • External-event-driven workflows — webhook / Kafka event → signal → workflow trigger with exactly-once.

When not to use

  • Tight intra-team dependencies — for a small set of workflows owned by one team, DAG edges may be simpler.
  • Real-time triggering — signals add latency; sub-second triggering is better served by native Kafka consumers.
  • Orchestrators that don't support exactly-once — the pattern becomes dangerous if the substrate is at-least-once and the orchestrator doesn't add dedup.

Structure

   External events              Internal step outputs
   (SNS / Kafka / HTTP)         (workflow A, step 3)
            │                          │
            ▼                          ▼
       ┌─────────────────────────────────────────┐
       │  Maestro signal service                 │
       │  · persist signal (lineage)             │
       │  · match against subscribers            │
       │  · dedup per (consumer, mapped-key)     │
       └─────────────────────────────────────────┘
               ┌──────────┼──────────┐
               ▼          ▼          ▼
        ┌──────────┐ ┌─────────┐ ┌──────────┐
        │ workflow │ │ step X  │ │ workflow │
        │ B start  │ │ unblock │ │ C start  │
        └──────────┘ └─────────┘ └──────────┘

Example (from the post)

An ETL workflow updates a table with data and publishes a signal containing the partition key. Three downstream consumers subscribe to signals with table=X AND partition>=Y:

  • Team B's ML training workflow — triggers exactly once per partition match.
  • Team C's reporting workflow — triggers exactly once per partition match.
  • Team D's audit-sampling step — unblocks step when threshold partitions available.

None of these teams know about the others; the signal protocol is the integration contract.

Seen in

Last updated · 319 distilled / 1,201 read