CONCEPT Cited by 1 source
Workflow breakpoint¶
A workflow breakpoint is an orchestrator-level pause-at-step primitive — the workflow equivalent of a debugger breakpoint. When a workflow instance reaches a step with a breakpoint, that step enters a paused state; the workflow graph does not advance past it until an operator manually resumes.
Canonical wiki instance is Netflix Maestro's breakpoint mechanism (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator):
"Maestro allows users to set breakpoints on workflow steps, functioning similarly to code-level breakpoints in an IDE."
Semantics¶
- Per-instance resume — if multiple workflow instances are paused at the same breakpoint, resuming one only affects that instance; others remain paused.
- Breakpoint-deletion resumes all — "Deleting the breakpoint will cause all paused step instances to resume."
- Foreach-aware — "Setting a single breakpoint on a step will cause all iterations of the foreach loop to pause at that step for debugging purposes." One breakpoint fans out across the foreach's parallel instances.
- State mutation support — "the breakpoint feature allows human intervention during the workflow execution and can also be used for other purposes, e.g. supporting mutating step states while the workflow is running."
Canonical uses¶
1. Initial workflow development¶
New workflows often have subtle issues in step outputs, parameter passing, or conditional branching. Instead of re-running the whole workflow from scratch on every iteration, breakpoints let the developer pause at specific steps, inspect runtime state, and resume — dramatically shortening the dev-test cycle.
2. Foreach + many parameters¶
Foreach iterations with heterogeneous parameters often surface bugs that only appear for specific values. A breakpoint on the foreach step pauses all iterations at that step — letting the developer inspect the full parameter space and resume iterations individually or collectively.
3. In-flight state mutation¶
The most operational use: an in-flight workflow has a wrong parameter value or corrupted intermediate state. Setting a breakpoint at the next step, mutating the state, then resuming avoids tearing down + re-running the whole workflow (which might be partway through a long ETL).
4. Human-in-the-loop gates¶
Although not the primary framing in the post, a per-step breakpoint effectively creates a manual-approval gate — useful for production promotion workflows where a human needs to confirm before the next step fires.
Why this is rare in mainstream orchestrators¶
Most workflow orchestrators (Airflow, Step Functions, Argo) treat workflow execution as closer to batch than to interactive — debugging is a read-only activity against completed runs, or at best manually triggered via clear-and-retry operations on specific tasks.
Maestro's breakpoint primitive is closer to live interactive debugging — the workflow is paused but not failed; state is inspectable + mutable; resumption continues from the paused point. This requires:
- Persistent + resumable step-runtime state (which Maestro has for other reasons — restart / retry support).
- Orchestrator-level cooperation from the step runtime to honour the pause signal.
- Per-instance granularity so that pausing one instance doesn't stall unrelated tenants.
Not to be confused with¶
- Sensor operators (Airflow) — wait for an external condition. Not an interactive debugging tool.
- Manual-approval state (Step Functions, Argo) — part of workflow definition, not ad-hoc.
- Pause-and-resume at workflow level (many orchestrators) — pauses the whole workflow, not a specific step for specific instances.
Seen in¶
- sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator — the canonical description + four operational use-cases