Skip to content

PATTERN Cited by 2 sources

Read-through object-store volume

Problem

A VM's disk needs to be:

  • Durable beyond any single host (so the host is disposable).
  • Fast on hot-path reads (so the workload is usable).
  • Cheap to snapshot, clone, migrate.
  • Portable — the workload isn't anchored to one physical.

A pure local-NVMe volume fails (1) and (4). A pure-object-store volume fails (2). A volume that layers NVMe read-through caching in front of an object-store-rooted durable tier gets all four — at the cost of two operational moving parts.

Pattern

Expose a block device / filesystem / virtual volume to the VM. Implement it as:

  1. Chunk storage in an S3-compatible object store. Chunks are immutable, content- addressed.
  2. Metadata DB mapping file / block addresses to chunk IDs. Lives adjacent to the running VM; durable via replication to the object store (e.g. Litestream) or equivalent.
  3. Sparse local NVMe volume as a dm-cache-style read- through cache. First access of a chunk fetches it from the object store and populates the cache; subsequent accesses hit NVMe at local-bus latency.

Key invariant: the NVMe cache is disposable. Worker death, disk failure, migration, or eviction = cache invalidation, not data loss.

Writes land either:

  • Synchronously to the object store (slow, safe) — Sprites' post doesn't specify the write policy; the "writes produce new immutable chunks" framing suggests eventual object-store persistence.
  • Or via a write-through / write-back scheme at the cache tier with background flush — the Sprites post doesn't elaborate.

Canonical wiki instance — Fly.io Sprites

"Every Sprite comes with 100GB of durable storage. We're able to do that because the root of storage is S3-compatible object storage."

"Our stack sports a dm-cache-like feature that takes advantage of attached storage. A Sprite has a sparse 100GB NVMe volume attached to it, which the stack uses to cache chunks to eliminate read amplification. Importantly (I can feel my resting heart rate lowering) nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store."

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

Other wiki instances / lineage

  • JuiceFS — the POSIX-FS shape this pattern usually takes. Sprites' implementation is a JuiceFS fork with SQLite metadata.
  • LSVD — Fly.io's 2023 block-device-level precursor ("bottomless S3-backed volumes"). Same pattern at the raw-block granularity. Per Ptacek: the LSVD implementation's write-latency wasn't adequate for hot Postgres nodes, so it hasn't displaced Fly Volumes.
  • Tigris — not a read-through volume per se, but uses the same metadata-DB + NVMe-byte-cache + S3-origin split at the object-store layer.

Why the pattern works

  • Durability moved to a purpose-built system. Object stores are engineered for 11-nines durability. A system rooted on one inherits that.
  • Hot reads stay fast. The NVMe cache on the hot path preserves the bus-hop read latency that local-NVMe-root volumes get natively. See [[concepts/read-through-nvme- cache]].
  • Migration reduces to a pointer-move. See concepts/durable-state-as-url — the VM's state is a URL into the chunk store.
  • Snapshot / clone reduces to metadata-DB clone. Immutable chunks are shareable across snapshots. See concepts/fast-checkpoint-via-metadata-shuffle.

Trade-offs

  • Write latency ceiling. Object stores are slower than local NVMe on writes. Heavy-fsync workloads (OLTP DBs) suffer.
  • Cold-worker first-touch tax. A Sprite migrating to a worker with no warm cache pays object-store GET latency on first touches of every chunk.
  • Two operational systems. Metadata DB and object store each have their own failure domain; correct ops means understanding both.
  • Chunk-GC story matters. Unreferenced chunks must be reclaimed; otherwise the object store grows monotonically as workloads churn.
  • Cache sizing. The NVMe cache is sized for working-set- sized hot data. Cold-access patterns with working sets > cache size regress to object-store latency.
  • Write-path design not fully disclosed for Sprites — performance characterisation of the write path is still open.

Seen in

  • [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki instance (Sprites' storage stack).
  • sources/2026-02-04-flyio-litestream-writable-vfsdatabase-level-variant instance via Litestream VFS hydration mode. Shape-parallel at coarser granularity: the VFS serves reads from object storage while a background loop pulls the whole database to a local file — same serve-remote-while-populating-local shape, applied at SQLite-page granularity for a single database, rather than block granularity for a volume. Explicitly name-checks dm-clone as the ancestor design. Used by the Sprite block map (JuiceFS metadata tier on SQLite + Litestream VFS), composed on top of this pattern's block-level instance.
Last updated · 319 distilled / 1,201 read