CONCEPT Cited by 2 sources
Event-triggering orchestration¶
Event-triggering orchestration is the architectural stance that workflow-to-workflow integration across teams should happen via a shared event / signal protocol — not via point-to-point dependencies, scheduled start times, or manual handoffs.
Canonical wiki framing comes from Netflix — "a hugely important detail that often goes overlooked" in the Metaflow-at-Netflix post (Source: sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix):
"It allows a team to integrate their Metaflow flows to surrounding systems upstream (e.g. ETL workflows), as well as downstream (e.g. flows managed by other teams), using a protocol shared by the whole organization."
The alternative shapes (and why they break at scale)¶
Point-to-point API calls¶
Team A's workflow calls Team B's webhook when done. Cost: N² wiring as more teams integrate, no replay, no shared audit, and the protocol is Team A's choice (not a shared contract).
Scheduled start times¶
Team B schedules their workflow to start at 3am because Team A's ETL finishes at 2:45am. Cost: schedule drift breaks everything; any variability in Team A's runtime cascades; no correctness contract.
Manual handoff¶
Team A pings Team B in Slack when done. Cost: humans scale worst.
Why event-triggering works¶
- Decouples producer + consumer identity — Team A produces a signal; Team B (and C, and D…) subscribe. Team A doesn't know who subscribes; Team B doesn't know which specific workflow produced.
- Carries data, not just timing — signals carry parameter values (partition keys, sizes, identities) that downstream consumers use to parameterise their execution.
- Uniform protocol — one team's workflow can subscribe to any team's outputs without per-team integration code.
- Composable with exactly-once (see concepts/exactly-once-signal-trigger) — external event sources are typically at-least-once, but the orchestrator can provide exactly-once trigger guarantees on top.
Canonical Netflix instance: Content Decision Making¶
The Metaflow-at-Netflix post names Content Decision Making as the flagship event-triggered workflow on Maestro:
"A flow graph of hundreds of models and intricate business logic supporting content decisions for 260M+ subscribers across 190+ countries, managed by a relatively small team of engineers and data scientists autonomously."
The "relatively small team managing the whole thing autonomously" framing is the load-bearing payoff — event-triggering enables small teams to coordinate across a large graph of dependent workflows without linear operational cost.
Mechanism: signals¶
Maestro's signal primitive is the concrete mechanism — see concepts/signal-based-step-dependency for the full architecture. The relevant properties at this altitude:
- Signals produced by step outputs (internal) or external systems (SNS / Kafka).
- Signals carry parameter values matched against subscriber
criteria (mapped field subset + operators
<,>,=). - Exactly-once trigger guarantee per signal or joined-signal set.
- Signal lineage — queryable producer/consumer history for audit + debugging.
Why this is often missed¶
Engineering teams building workflow orchestrators often focus on the intra-workflow experience — DAG editing, step runtime, retry logic, parameter passing — and treat inter-workflow integration as "a different problem." Event-triggering orchestration flips this: the inter-workflow protocol is a first-class orchestrator feature, not an afterthought, because that's where the organisational scaling happens.
Seen in¶
- sources/2024-07-22-netflix-supporting-diverse-ml-systems-at-netflix — the cross-team-protocol framing
- sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator — the signal-based mechanism