Skip to content

PATTERN Cited by 1 source

Composite workflow pattern

Summary

Ship three engine-native workflow primitivesforeach, subworkflow, conditional branch — that compose into higher- order workflows. Each primitive has a deterministic semantics implemented inside the orchestrator engine (not in user code), and their composition covers common production shapes like auto- recovery, backfills, and ML model tuning.

Canonical wiki instance: Netflix Maestro's three composite primitives (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator).

Problem

Workflow orchestrators handle the "N-step linear pipeline" case well. Production patterns that need more than that include:

  • Data backfills — run the same pipeline across 1000+ historical date partitions.
  • ML hyperparameter sweeps — run the same training workflow across a grid of hyperparameter combinations.
  • Audit-remediate-retry — check output; if invalid, run a remediation flow; retry main workflow.
  • Hierarchical reuse — a complex workflow reused as a component inside other workflows.

Without engine-native primitives, users fall back to either:

  • External drivers — a user script loops + re-triggers the workflow. Loses orchestrator visibility; no unified lineage.
  • Hand-rolled DAG expansion — generate the DAG with 1000 nodes at definition time. Doesn't scale.
  • One-off scripts — every team reinvents foreach, retry loops, conditional subworkflow calls with subtle bugs.

Solution

Three primitives, implemented in the engine — so they're optimised uniformly, have consistent semantics across the organisation, and compose cleanly.

1. Foreach

  • Models iteration over a parameter collection.
  • "Each iteration of the foreach loop is internally treated as a separate workflow instance, which scales similarly as any other Maestro workflow." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)
  • The foreach step monitors + collects per-iteration statuses.
  • Parameters loop_params + loop_index are engine-injected for each iteration.

Canonical use: data backfill, ML hyperparameter tuning, partition processing.

2. Subworkflow

  • A step runs another workflow — "workflow as a function."
  • Enables a graph of workflows with shared components.
  • Output parameters flow back to the caller.

Canonical use: shared common functions across teams — "complex workflows consisting of hundreds of subworkflows to process data across hundreds tables, where subworkflows are provided by multiple teams."

3. Conditional branch

  • Subsequent steps run only if an upstream condition is met.
  • Conditions are SEL expressions evaluated at runtime against step parameters + outputs.
  • Combined with other primitives for audit-remediate-retry + error-handling flows.

Canonical use: skip expensive steps when inputs don't justify them; take remediation branches on audit failures.

Composition — the auto-recovery example

The post's worked example (rendered as text):

┌────────────────────────┐
│ subworkflow: job1      │    (ETL + audit subworkflow)
└───────┬────────────────┘
┌────────────────────────┐
│ status_check_step      │    (read job1's audit status via SEL)
└───────┬────────────────┘
┌────────────────────────┐
│ conditional branch     │
│   if audit_failed:     │
└───┬────────────────────┘
    │       │
    ▼       ▼
 recovery   (complete
 subworkflow workflow)
┌────────────────────────┐
│ subworkflow: job2      │    (retry ETL + audit with same shape as job1)
└────────────────────────┘

The whole flow is composable — subworkflow + conditional branch + subworkflow — and the components are engine primitives, not custom per-team glue.

Other composition shapes

  • Nested foreach inside foreach — 2D parameter sweeps (date × dataset).
  • Foreach over subworkflows — same pipeline invoked with heterogeneous inputs.
  • Conditional subworkflow selection — pick one of N subworkflows based on input characteristics.

Why engine-native implementation matters

"Direct engine support not only enables us to optimize these patterns but also ensures a consistent approach to implementing them."

Three concrete wins:

  1. Uniform optimisation — the engine can parallelise foreach iterations, batch subworkflow creation, short-circuit conditional evaluation without tenant-code opt-in.
  2. Consistent failure modes — retry, restart, breakpoints, rollup all interact correctly with the primitives.
  3. Lineage + observability — all composite structure is visible to the orchestrator for audit + debugging.

When not to use

  • Simple linear pipelines — don't reach for foreach / conditional when a straight DAG suffices.
  • Cross-process orchestration — if the composition needs to span orchestrator boundaries, use signals instead.
  • Real-time control loops — orchestrator composition is the wrong altitude for sub-second control; use Temporal-style workflow-as-code or a dedicated control plane.

Seen in

Last updated · 319 distilled / 1,201 read