PATTERN Cited by 1 source
Rollback-capable migration tool¶
What it is¶
A rollback-capable migration tool is a bespoke automation tool where the reverse transition is a first-class command — not an emergency escape hatch, not "we'll write it if we need it." Both forward and backward transitions:
- Use the same mechanism (cordon + drain + replace, respecting the same safety contracts).
- Are equally tested.
- Run under the same CI/CD pipeline.
- Are idempotent.
The result is that rollback is a cheap operation, which in turn makes the whole migration cheap — you can try a stage, soak it, and unwind if the soak reveals issues, without operational heroics.
Why this is distinctive¶
Most migration tooling has an asymmetric cost structure: forward-migration is automated; rollback is a manually-scripted emergency procedure. This asymmetry biases the organization toward not rolling back even when they should — because the rollback costs 10x the forward step in engineering time and risk.
Investing to make the reverse transition first-class changes the operational economics:
- Soak times (see patterns/phased-migration-with-soak-times) are safe because rollback is cheap.
- Risk-based sequencing is meaningful because the low-risk stages are genuinely low-risk — you can walk them back.
- Individual cluster regressions don't block the whole migration — unwind the affected one, keep moving.
Salesforce canonical instance¶
The 2026-01-12 Karpenter migration post describes the rollback capability explicitly as a design principle:
"The team developed an in-house Karpenter transition tool to orchestrate the switch-over safely and consistently, and a Karpenter patching check tool. Karpenter transition tool and Karpenter patching check tool provide a comprehensive solution for migrating Kubernetes clusters to and from Karpenter node management while maintaining operational continuity through automated node rotation, Amazon Machine Image (AMI) validation, and graceful pod eviction handling."
*"Key design principles included:
- Zero disruption – The tool cordoned and drained legacy nodes with full respect for pod disruption budgets (PDBs), maintaining workload safety
- Rollback support – A reverse transition capability allowed fast recovery to Auto Scaling group–based auto scaling if needed
- Continuous integration and continuous delivery (CI/CD) integration – The tool was embedded in the core infrastructure provisioning pipeline, standardizing the migration across services."* (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)
Three load-bearing properties in one sentence: to and from (not just forward), fast recovery (not emergency), standardized (not bespoke).
Shape¶
- Reversible primitive operation. Each migration operation (cordon, drain, replace, update config) has a precisely inverse operation. Often the inverse is the same operation with different parameters — cordon a Karpenter node, drain, restore ASG config, let CA take over.
- Same safety contracts both ways. PDB respect applies to forward cordoning and to rollback cordoning.
- State snapshot before forward step. Record enough state (full legacy config, node labels, allocations) to fully restore on rollback.
- Single entry point. One tool that dispatches forward or reverse based on a flag, not two separate codebases.
- CI/CD-integrated. Both directions run through the same pipeline gate / approval / audit-trail that applies to the forward direction.
Trade-offs¶
- Build cost. Engineering a fully reversible tool is more expensive than engineering a forward-only one. Justifies itself only at scale (Salesforce: 1,000 clusters + multi-month rollout).
- Testing surface. Both directions need test coverage.
- Config symmetry enforcement. The new config must be losslessly round-trippable to the old one. If the new system's schema is strictly more expressive than the old one, some forward migrations produce configs that can't round-trip.
Related patterns¶
- patterns/phased-migration-with-soak-times — the pattern that rollback-capability enables; without cheap rollback, soak is just "wait and hope."
- patterns/automated-configuration-mapping — forward direction; pairs with rollback-capable tooling as the inverse.
- patterns/fast-rollback — sibling pattern at the deployment level; this is the migration-scale variant.
- patterns/rollout-escape-hatch — sibling pattern for rollout safety.
Seen in¶
- sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's Karpenter transition tool has rollback-to-ASG as a first-class design principle, embedded in CI/CD. Canonical wiki instance.