PATTERN Cited by 1 source
Compensation stub recovery via replay¶
Intent¶
Rebuild in-memory callable references to compensation (rollback) handlers after an engine crash or restart, by replaying the workflow code and re-encountering the step.do() calls that originally registered them — without re-executing the forward step bodies.
Context¶
In a durable workflow engine, rollback handlers are registered as in-memory callable references (stubs) when forward steps are declared. If the engine is evicted, crashes, or restarts, these in-memory references are lost. However, the engine still knows which steps registered rollback from its persisted step history. The problem: how to recover the callable handlers without re-invoking external side effects.
Solution¶
Use the engine's existing replay mechanism:
- On recovery, re-run the Workflow code from the beginning.
- When replay encounters a completed
step.do(), read the persisted result instead of re-executing the step body (standard replay behavior). - When replay encounters a
step.do()with a rollback option whose step is eligible for compensation, re-register the rollback stub from the options object. - Once replay completes, the engine has rebuilt all callable handlers it needs to execute the reverse walk.
The key insight: the rollback function is source code co-located with the step definition (not data stored in the durable log). Replay re-encounters the source code, which re-registers the callable reference, which the engine can then invoke.
Properties¶
- No serialization of handler code. The engine doesn't persist the text or bytecode of the rollback function. It persists only the fact that rollback was registered + the step output. The handler itself is recovered by running the code.
- Relies on deterministic replay. The Workflow must take the same code path on replay to encounter the same
step.do()registrations. Non-deterministic branching before a rollback-bearing step could prevent recovery. - Workers RPC stubs provide the callable reference model. In Cloudflare's implementation, stubs are handles to code running elsewhere, with lifetime management via
dup(). Rollback stubs outlive the immediatestep.do()scope.
Trade-offs¶
- Elegant in the common case — no extra persistence or serialization needed for handler code.
- Fragile under code changes — if a deploy changes the Workflow between the original execution and the crash recovery, replay may encounter different rollback logic than was originally registered.
- Determinism dependency — non-deterministic Workflow code could prevent replay from reaching the correct step registrations.
Seen in¶
- sources/2026-06-25-cloudflare-saga-rollbacks-for-workflows — canonical wiki instance. Cloudflare Workflows uses replay to rebuild rollback stubs after engine restart. "To recover, Workflows uses replay: a recovery mode where it can re-run the Workflow code without re-executing completed forward step bodies. [...] As those step.do() calls are encountered, their rollback options can register the callable stubs again."
Related¶
- concepts/workflow-replay-from-checkpointed-actions — the underlying replay mechanism
- concepts/durable-execution — the property that makes persisted step history available for replay decisions
- concepts/workflow-compensation-action — the abstract compensation concept recovered by this mechanism
- patterns/saga-rollback-as-step-metadata — the API shape whose co-location property enables this recovery
- patterns/saga-over-long-transaction — the overarching saga pattern
- systems/cloudflare-workflows — canonical implementation