Skip to content

SYSTEM Cited by 1 source

dm-clone (Linux Device Mapper)

dm-clone is a Linux kernel device-mapper target that creates a block-level asynchronous clone of a source block device. Given a readable source device, it presents a new device of identical size where:

  • Reads of uninitialised (un-hydrated) blocks fall through to the source device.
  • Reads of hydrated blocks are served locally from the clone.
  • Writes go only to the clone.
  • A background kcopyd thread rehydrates blocks from source to clone independently of user I/O.

State is tracked in a separate metadata device (a bitmap of "is this block hydrated yet?"). Upstream docs: Documentation/admin-guide/device-mapper/dm-clone.rst.

Map function (sketch)

Paraphrasing the kernel source the 2024-07-30 Fly.io post quotes:

region_nr = bio_to_region(clone, bio);

if (dm_clone_is_region_hydrated(clone->cmd, region_nr)) {
    // We have the block locally.
    remap_and_issue(clone, bio);
    return 0;
} else if (bio_data_dir(bio) == READ) {
    // Read miss; fall through to the source.
    remap_to_source(clone, bio);
    return 1;
}
// Write miss; write to the clone, kick hydration.
remap_to_dest(clone, bio);
hydrate_bio_region(clone, bio);
return 0;

Role in Fly.io migrations

In the Making Machines Move post, dm-clone is the kernel-side half of Fly's killcloneboot migration. The source device is the origin Volume on the draining worker, mounted on the target worker over a network block protocol (iSCSI, having tried NBD first). The clone device is a fresh local volume on the target worker. A new Fly Machine boots with the clone device attached; reads fall through to the source over the network until kcopyd has rehydrated the relevant blocks.

"kill, clone, boot is fast; it can be made asymptotically as fast as stateless migration."

TRIM / DISCARD short-circuit

Fly Volumes are typically very sparse. dm-clone supports short-circuiting the hydration of unused blocks via DISCARD:

"A DISCARD issued on the clone device will get picked up by dm-clone, which will simply short-circuit the read of the relevant blocks by marking them as hydrated in the metadata volume."

To drive this: the target worker decrypts the source volume (requires Fly to work out the LUKS2 plaintext shape — see systems/dm-crypt-luks2), mounts the filesystem, runs fstrim, and the filesystem-issued DISCARDs propagate through the device-mapper stack to dm-clone, which marks those blocks hydrated without pulling them over the network. Canonical concepts/trim-discard-integration instance.

Seen in

Caveats

  • dm-clone needs a correctly-sized metadata device sized for the clone volume's block count; the post does not cover sizing policy.
  • Hydration rate and write-performance characteristics on a partially-hydrated clone are not published by Fly in this post.
  • The post doesn't discuss hydration-policy tuning (throughput pacing, per-region concurrent hydrations, priority during user I/O bursts).
Last updated · 200 distilled / 1,178 read