Skip to content

CONCEPT Cited by 1 source

Object storage as disk root

Definition

A storage-architecture decision: the authoritative durability tier for a VM's disk is an S3-compatible object store, not local NVMe. Local NVMe may still sit on the hot read path (as a cache), but losing a worker's NVMe does not lose user data — because the bytes were never there authoritatively. The object store is the root of durability; local storage is a performance accelerator.

Canonical wiki statement

Fly.io Sprites, 2026-01-14:

"Every Sprite comes with 100GB of durable storage. We're able to do that because the root of storage is S3-compatible object storage. […] Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it's a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words 'Sprites are backed by object storage.'"

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

And the orchestration consequence:

"In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially."

Implications

1. Workload is not anchored to a physical

Fly Volumes' canonical shape is "attached to a specific worker physical". A Machine with a Volume cannot be trivially moved — this broke Fly's drain playbook for three years until the 2024 async block-clone migration shipped.

With object-store-rooted disks, the Machine can be re-started on any worker; the new worker pulls chunks from the object store on demand (through the [[concepts/read-through-nvme- cache|local cache]]) and rebuilds hot state on first touch.

"Before we did Fly Volumes, that was as simple as pushing a 'drain' button on a server. Imagine losing a capability like that."

2. Physical failure is cache invalidation, not data loss

Local NVMe becoming unreachable (a dead worker, a corrupted filesystem, a disk swap) is a cache-miss event, not a data-loss event. The authoritative bytes are in the object store.

"Attached storage is fast, but can lose data — if a physical blows up, there's no magic what rescues its stored bits. You're stuck with our last snapshot backup. That's fine for a replicated Postgres! It's part of what Postgres replication is for. But for anything without explicit replication, it's a very sharp edge."

3. Durable state reduces to a URL

See concepts/durable-state-as-url. Moving a workload is moving a URL reference; the bytes don't move.

4. Cheap snapshots / forks / checkpoints

Because chunks are immutable and content-addressed in the object store, a snapshot is a metadata operation. See concepts/fast-checkpoint-via-metadata-shuffle.

Trade-offs Fly explicitly names

Fly flags this as "not a pure win" for Fly Machines — the argument for keeping local-NVMe-as-root for some workloads:

"We've always wanted to run Fly Machine disks off object storage (we have an obscure LSVD feature that does this), but the performance isn't adequate for a hot Postgres node in production."

Write latency to an object store (even regional, even hot-path- cached) is orders of magnitude higher than local NVMe. For workloads that commit at every write (Postgres WAL fsync, OLTP flushing), object-store-rooted disks aren't yet competitive.

The NVMe cache is sparse and disposable

A Sprite has a sparse 100 GB NVMe volume attached — sparse because chunk materialisation is lazy. "Nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store." Cache eviction / loss / worker-swap means "refetch from object store", not "data corruption" or "restore-from-snapshot".

See concepts/read-through-nvme-cache.

  • JuiceFS — POSIX FS over object-store chunks + transactional metadata DB. Sprites' storage stack is a JuiceFS fork.
  • LSVD — Fly.io's 2023 experimental "bottomless S3-backed volumes". Same premise at the block- device level. Named by the Sprites post as the Fly-Machine-side experiment whose perf ceiling forced a different shape for Sprites.
  • Tigris — regional S3-compat store. Likely (though not named) the chunk backend.
  • AWS S3's "Zonal Express One" (not discussed in this post, but same architectural direction — object storage collapsing the block/object distinction on the hot path).

Contrast axis

Axis Local NVMe as root (Fly Volumes) Object store as root (Sprites)
Durability Worker-local; snapshot-backed S3-grade 11-nines
Migration Expensive (3 years to ship) Trivial — re-point a URL
Worker failure Data loss unless replicated Cache invalidation
Write latency NVMe-grade Object-store-grade (slower)
Hot read latency NVMe-grade NVMe-grade (cache)
Fits OLTP DBs Yes No (performance inadequate)
Fits agent workloads Unnecessarily anchored Natural fit

Seen in

  • [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement; "Object stores are the Internet's Hoover Dams."
Last updated · 319 distilled / 1,201 read