CONCEPT Cited by 1 source

Read-through NVMe cache (sparse)¶

Definition¶

A disposable local-NVMe cache sitting in front of an immutable object-store-backed disk. The volume is sparse (backing store grows only as chunks are fetched), content is content-addressable from the object store, and the cache is read-through: misses trigger an object-store GET that populates the local volume. Crucially, the cache is not a durability tier — a worker dying, the NVMe failing, or the VM being migrated is a cache-flush event, not data loss.

Canonical wiki statement¶

Fly.io Sprites, 2026-01-14:

"Our stack sports a dm-cache-like feature that takes advantage of attached storage. A Sprite has a sparse 100GB NVMe volume attached to it, which the stack uses to cache chunks to eliminate read amplification. Importantly (I can feel my resting heart rate lowering) nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store."

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

Why sparse + read-through works¶

Three invariants combine:

Chunks are immutable and content-addressed in the object-store root. Two hosts reading the same chunk see byte-identical content; the cache key equals the chunk identifier.
The authoritative copy is elsewhere. The local cache's correctness is "is this chunk ID's content locally available? If not, fetch." — no coherence protocol is needed.
Cache miss on a bound chunk equals "do a GET". Not a data-loss event; not a restore-from-snapshot event.

Result: the usual stressful properties of a durable local-NVMe store — fsync semantics, replication, backup cadence, worker- affinity — are offloaded to the object store. The NVMe does the one job it's best at: fast reads.

Ptacek: "(I can feel my resting heart rate lowering)"

Sparse: why it matters¶

A sparse 100 GB volume only physically allocates blocks as chunks are materialised. Practical consequences:

Thin provisioning: a freshly-created Sprite has a 100 GB capacity but ~0 GB of actual NVMe usage. Per-worker capacity scales by workloads-in-use, not by per-Sprite reservation.
Billing alignment: users are billed for chunks actually written, not the 100 GB capacity ("Sprites bill only for what you actually use (in particular: only for storage blocks you actually write, not the full 100GB capacity)").
Cheap to have lots of empty Sprites. Compatible with the warm-pool pattern (concepts/warm-sprite-pool).
Evict-friendly. Evicting a chunk from NVMe reclaims physical space without affecting the logical disk view.

The `dm-cache-like` qualifier¶

dm-cache is a Linux device-mapper target that stacks a smaller-fast-device in front of a slower-larger-device for block-level caching. Ptacek's phrasing — dm-cache-like, not dm-cache — signals:

The architectural shape matches dm-cache (block-level, read-through, policy-driven eviction) …
… but the implementation is custom — plausibly integrated with the chunk-addressed metadata layer rather than running at the opaque-block tier that dm-cache targets.

Eviction policy, cache-line size, write-through vs write-back semantics, and dirty-page flushing semantics are not disclosed in the post.

Eliminating read amplification¶

The cache's explicit role: "eliminate read amplification."

Without the cache, each read would:

Consult metadata to find the chunk key.
GET the chunk from the object store.
Extract the relevant byte range.

Under sequential reads or hot pages, steps 2-3 repeat redundantly. With the cache, step 2 is amortised: first read populates the NVMe, subsequent reads hit NVMe at local-bus latency.

This is the bus-hop-on-hot-path half of the bus-hop trade. Fly Volumes kept both halves (bus-hop reads and bus-hop durability = worker-anchored). Sprites split them: bus-hop reads stay local (cache), durability ships to object store (root).

What happens on worker migration / failure¶

Cold worker (Sprite moves to a host with no prior cache): first touches of every chunk pay the object-store latency. The dm-cache-like warm-up is gradual, driven by access patterns.
Dead worker (NVMe unavailable): same as cold worker on whatever worker the Sprite gets re-started on. No data loss. No restore. Cache is "just" missing.
Checkpoint-restore: similarly cheap — see concepts/fast-checkpoint-via-metadata-shuffle and patterns/checkpoint-as-metadata-clone.

Operational numbers not disclosed¶

Cache-hit rate (at steady state, on first boot, after migration).
Object-store GET latency distribution on miss.
Chunk size / cache-line size.
Prefetch policy (eager? sequential-detect? driven by metadata?).
Write path: write-back to object store via what policy? (The post doesn't separately describe the write path, only "cache to eliminate read amplification".)
Coexistence with the Sprite workload's own page cache.

Seen in¶

[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement. "Nothing in that NVMe volume should matter."

systems/fly-sprites
systems/juicefs — the filesystem stack this cache lives under.
concepts/object-storage-as-disk-root — the durability decision this cache preserves perf for.
concepts/immutable-object-storage — the invariant the cache leans on.
concepts/bus-hop-storage-tradeoff — the framing this concept unbundles.
patterns/read-through-object-store-volume
companies/flyio