CONCEPT Cited by 1 source
TRIM / DISCARD integration¶
TRIM / DISCARD integration is the filesystem-to-block-layer
signal pathway that lets unused filesystem blocks be
communicated to the block device so the block layer can avoid
unnecessary work on them. SSDs use it for wear-levelling and
garbage collection; thin-provisioned volumes use it to reclaim
unused space; and — the wiki-canonical case from the 2024-07-30
Fly.io post —
dm-clone uses it to short-circuit the
hydration of unused blocks during migration.
How the signal flows¶
- Filesystem knows which blocks are unused (Linux
fstrimwalks the free-list and generatesDISCARDrequests for every free extent). - Block layer receives
DISCARDbios through the block- device interface. - Device-mapper targets receive
DISCARDbios through the device-mapper pass-through. - dm-clone's map function intercepts
DISCARDand, per upstream source, marks the region as hydrated in the metadata bitmap without actually copying the data.
Why it's load-bearing for sparse-volume migration¶
Fly.io's phrasing:
Most people use just a small fraction of the volumes they allocate. A 100GiB volume with just 5MiB used wouldn't be at all weird. You don't want to spend minutes copying a volume that could have been fully hydrated in seconds.
Without TRIM integration, dm-clone would rehydrate 100 GiB of
mostly-empty blocks over iSCSI. With TRIM, fstrim on the target
side issues DISCARDs that dm-clone short-circuits — the
metadata bitmap gets marked hydrated, no network fetches happen
for empty blocks, and the clone catches up in seconds rather than
minutes.
What Fly had to do¶
"To make that work, we need the target worker to see the
plaintext of the source volume (so that it can do an fstrim —
don't get us started on how annoying it is to sandbox this — to
read the filesystem, identify the unused block, and issue the
DISCARDs where dm-clone can see them)."
This implies: on the target worker, mount the decrypted view of
the source Volume through the DM stack, run fstrim against that
mount, then let the DISCARDs propagate back through the DM
stack to dm-clone. The "how annoying it is to sandbox this"
aside acknowledges that mounting arbitrary customer filesystems
on a platform worker is a non-trivial isolation problem.
Seen in¶
- sources/2024-07-30-flyio-making-machines-move — Canonical wiki instance of TRIM / DISCARD short-circuiting clone-tier hydration for sparse volumes.
Related¶
- systems/dm-clone — The consumer of the
DISCARDsignals. - concepts/block-level-async-clone — The architectural pattern that TRIM integration optimises.
- systems/fly-volumes — The sparse-volume workload that makes this optimisation matter.