PATTERN Cited by 1 source

Rollback-capable migration tool¶

What it is¶

A rollback-capable migration tool is a bespoke automation tool where the reverse transition is a first-class command — not an emergency escape hatch, not "we'll write it if we need it." Both forward and backward transitions:

Use the same mechanism (cordon + drain + replace, respecting the same safety contracts).
Are equally tested.
Run under the same CI/CD pipeline.
Are idempotent.

The result is that rollback is a cheap operation, which in turn makes the whole migration cheap — you can try a stage, soak it, and unwind if the soak reveals issues, without operational heroics.

Why this is distinctive¶

Most migration tooling has an asymmetric cost structure: forward-migration is automated; rollback is a manually-scripted emergency procedure. This asymmetry biases the organization toward not rolling back even when they should — because the rollback costs 10x the forward step in engineering time and risk.

Investing to make the reverse transition first-class changes the operational economics:

Soak times (see patterns/phased-migration-with-soak-times) are safe because rollback is cheap.
Risk-based sequencing is meaningful because the low-risk stages are genuinely low-risk — you can walk them back.
Individual cluster regressions don't block the whole migration — unwind the affected one, keep moving.

Salesforce canonical instance¶

The 2026-01-12 Karpenter migration post describes the rollback capability explicitly as a design principle:

"The team developed an in-house Karpenter transition tool to orchestrate the switch-over safely and consistently, and a Karpenter patching check tool. Karpenter transition tool and Karpenter patching check tool provide a comprehensive solution for migrating Kubernetes clusters to and from Karpenter node management while maintaining operational continuity through automated node rotation, Amazon Machine Image (AMI) validation, and graceful pod eviction handling."

*"Key design principles included:

Zero disruption – The tool cordoned and drained legacy nodes with full respect for pod disruption budgets (PDBs), maintaining workload safety

Rollback support – A reverse transition capability allowed fast recovery to Auto Scaling group–based auto scaling if needed

Continuous integration and continuous delivery (CI/CD) integration – The tool was embedded in the core infrastructure provisioning pipeline, standardizing the migration across services."* (Source: sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters)

Three load-bearing properties in one sentence: to and from (not just forward), fast recovery (not emergency), standardized (not bespoke).

Shape¶

Reversible primitive operation. Each migration operation (cordon, drain, replace, update config) has a precisely inverse operation. Often the inverse is the same operation with different parameters — cordon a Karpenter node, drain, restore ASG config, let CA take over.
Same safety contracts both ways. PDB respect applies to forward cordoning and to rollback cordoning.
State snapshot before forward step. Record enough state (full legacy config, node labels, allocations) to fully restore on rollback.
Single entry point. One tool that dispatches forward or reverse based on a flag, not two separate codebases.
CI/CD-integrated. Both directions run through the same pipeline gate / approval / audit-trail that applies to the forward direction.

Trade-offs¶

Build cost. Engineering a fully reversible tool is more expensive than engineering a forward-only one. Justifies itself only at scale (Salesforce: 1,000 clusters + multi-month rollout).
Testing surface. Both directions need test coverage.
Config symmetry enforcement. The new config must be losslessly round-trippable to the old one. If the new system's schema is strictly more expressive than the old one, some forward migrations produce configs that can't round-trip.

patterns/phased-migration-with-soak-times — the pattern that rollback-capability enables; without cheap rollback, soak is just "wait and hope."
patterns/automated-configuration-mapping — forward direction; pairs with rollback-capable tooling as the inverse.
patterns/fast-rollback — sibling pattern at the deployment level; this is the migration-scale variant.
patterns/rollout-escape-hatch — sibling pattern for rollout safety.

Seen in¶

sources/2026-01-12-aws-salesforce-karpenter-migration-1000-eks-clusters — Salesforce's Karpenter transition tool has rollback-to-ASG as a first-class design principle, embedded in CI/CD. Canonical wiki instance.