PATTERN Cited by 2 sources

Checkpoint as metadata clone¶

Problem¶

A storage or VM product wants first-class checkpoint / restore — fast enough that users treat checkpoints like git commit (cheap, often, ordinary) rather than snapshot-restore (expensive, rare, emergency-only). Copying bytes at checkpoint time scales linearly with dataset size and blows the latency budget for anything larger than a few gigabytes.

Pattern¶

On a metadata + chunk storage stack with immutable chunks:

Checkpoint = snapshot the metadata DB at an instant. Content chunks are unchanged and shared across all checkpoints.
Restore = re-point the running workload to the checkpoint's metadata. Content chunks don't move; the running VM transparently starts reading from the older chunk set.

Cost: O(metadata-size), not O(data-size). For Sprites-shape workloads (dozens-of-MB metadata DBs, hundreds-of-GB content tiers) the difference is 3-4 orders of magnitude.

Precondition: chunks must be immutable. Writes produce new chunks; old chunks referenced by prior checkpoints remain live.

Canonical wiki instance — Fly.io Sprites¶

"This also buys Sprites fast checkpoint and restore. Checkpoints are so fast we want you to use them as a basic feature of the system and not as an escape hatch when things go wrong; like a git restore, not a system restore. That works because both checkpoint and restore merely shuffle metadata around."

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

User-facing UX from the [[sources/2026-01-09-flyio-code-and- let-live|2026-01-09 launch post]]: break the filesystem deliberately (rm -rf, dd random, pip3 install) → run sprite checkpoint restore v1 → the Sprite is back to the pre-damage state in ~1 second.

Ptacek's guidance — "our pre-installed Claude Code will checkpoint aggressively for you without asking" — leans on checkpoint-as-metadata-clone being cheap enough for agent-driven auto-checkpointing.

UX framing: git restore, not system restore¶

The cost profile of the pattern reshapes the product:

Model	UX framing	When used
Slow checkpoint (O(data))	system restore	Emergency only
Fast checkpoint (O(metadata))	git restore	Routine workflow

Both are checkpoint-restore, but the economics flip which workflows the feature fits into.

Adjacencies¶

Git. Git's snapshot model is arguably the archetypal metadata-clone pattern — commits reference blob hashes; content-addressable blobs mean branching / tagging / rewinding are constant-time metadata ops.
ZFS / BTRFS snapshots. Filesystem-level pattern with the same shape, at the block level.
Copy-on-write memory. Process fork() in Unix — metadata clone (page tables), data shared until divergent writes materialise new pages.
Docker image layering. Each layer is metadata over immutable content-addressed layers; branching a layer is metadata-cheap.
Git-LFS, dvc. Metadata-pointer files track immutable content-addressed payloads.

Sprites is one of the first cases where a full VM-level disk is wrapped in this model for per-VM interactive checkpoint/restore.

Trade-offs¶

Immutable-chunks GC. Chunks referenced only by deleted checkpoints must be reclaimed. The post doesn't describe Sprites' GC mechanism.
Checkpoint retention. Every retained checkpoint roots a metadata snapshot. Storage cost = chunks-uniquely-retained + sum(metadata-snapshot-size).
Memory-state checkpoints not addressed. The Sprites post is filesystem-level. Memory + process-state checkpointing is a separate (usually more expensive) operation (CRIU-shape).
Pre-existing mutable data. If some tier of the storage stack is mutable in place, the pattern breaks — you'd have to either convert to immutable on checkpoint or refuse to checkpoint mutating data.
Metadata-DB scale ceiling. Very large metadata DBs make "clone" itself nontrivial. SQLite-scale (MB) is comfortable; TB-scale metadata breaks the pattern.

patterns/metadata-plus-chunk-storage-stack — the architectural precondition.
patterns/checkpoint-backup-to-object-storage — a DR- level pattern (rebuild whole cluster from last good checkpoint); overlaps in vocabulary but different use case and failure model.
patterns/blobless-clone-lazy-hydrate — related shape in the Git-at-scale world.

Seen in¶

[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — mechanism disclosure.
sources/2026-01-09-flyio-code-and-let-live — UX disclosure.