Skip to content

CONCEPT Cited by 1 source

TRIM / DISCARD integration

TRIM / DISCARD integration is the filesystem-to-block-layer signal pathway that lets unused filesystem blocks be communicated to the block device so the block layer can avoid unnecessary work on them. SSDs use it for wear-levelling and garbage collection; thin-provisioned volumes use it to reclaim unused space; and — the wiki-canonical case from the 2024-07-30 Fly.io postdm-clone uses it to short-circuit the hydration of unused blocks during migration.

How the signal flows

  1. Filesystem knows which blocks are unused (Linux fstrim walks the free-list and generates DISCARD requests for every free extent).
  2. Block layer receives DISCARD bios through the block- device interface.
  3. Device-mapper targets receive DISCARD bios through the device-mapper pass-through.
  4. dm-clone's map function intercepts DISCARD and, per upstream source, marks the region as hydrated in the metadata bitmap without actually copying the data.

Why it's load-bearing for sparse-volume migration

Fly.io's phrasing:

Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn't be at all weird. You don't want to spend minutes copying a volume that could have been fully hydrated in seconds.

Without TRIM integration, dm-clone would rehydrate 100 GiB of mostly-empty blocks over iSCSI. With TRIM, fstrim on the target side issues DISCARDs that dm-clone short-circuits — the metadata bitmap gets marked hydrated, no network fetches happen for empty blocks, and the clone catches up in seconds rather than minutes.

What Fly had to do

"To make that work, we need the target worker to see the plaintext of the source volume (so that it can do an fstrim — don't get us started on how annoying it is to sandbox this — to read the filesystem, identify the unused block, and issue the DISCARDs where dm-clone can see them)."

This implies: on the target worker, mount the decrypted view of the source Volume through the DM stack, run fstrim against that mount, then let the DISCARDs propagate back through the DM stack to dm-clone. The "how annoying it is to sandbox this" aside acknowledges that mounting arbitrary customer filesystems on a platform worker is a non-trivial isolation problem.

Seen in

Last updated · 200 distilled / 1,178 read