Skip to content

PATTERN Cited by 1 source

Non-disruptive migration (live tenant relocation)

Intent

Move a live tenant's state between servers, hardware generations, or on-disk formats without customer-visible disruption โ€” so the fleet underneath them can be rebuilt, upgraded, or re-formatted continuously.

Why it is a foundational primitive

Per Marc Olson's EBS retrospective, this is the capability that makes concepts/incremental-delivery possible at fleet scale:

This ability to migrate customer volumes to new storage servers has come in handy several times throughout EBS's history as we've identified new, more efficient data structures for our on-disk format, or brought in new hardware to replace the old hardware. There are volumes still active from the first few months of EBS's launch in 2008. These volumes have likely been on hundreds of different servers and multiple generations of hardware as we've updated and rebuilt our fleet, all without impacting the workloads on those volumes.

Build it once, reap the benefit on every future architectural change.

Prerequisites

  • State has a stable identity independent of the current server. In EBS, a volume ID is not tied to the storage server currently hosting it.
  • Replication / durability boundary exists inside the data plane. You can start a new replica on the target, catch it up, cut over, and retire the old โ€” all without asking the tenant to participate.
  • Control plane can schedule migrations (capacity planning, draining, topology-aware placement).
  • Software supports rolling forward on-disk formats. Old and new formats coexist during transition.

Uses (in EBS history)

  • SSD retrofit (2013). Servers drained, SSD installed (see patterns/hot-swap-retrofit), volumes migrated back.
  • New storage-server types. Provisioned-IOPS servers were a new server class; existing volumes could be migrated onto them without customer action.
  • On-disk format changes. More efficient data structures rolled out fleet-wide by migrating volume state into the new format.
  • Nitro offload transitions. Moving the IO-path software from Xen dom0 to Nitro cards.
  • Custom systems/aws-nitro-ssd adoption. Entirely new media generation, adopted via the same migration primitive.

Anti-patterns it avoids

  • Forklift cutovers with planned downtime windows.
  • "New fleet, old fleet" parallel operation that doubles capex forever.
  • Customer-driven migration tasks ("please recreate your volume on the new tier").

Seen in

Last updated ยท 200 distilled / 1,178 read