CONCEPT Cited by 1 source

Object storage as disk root¶

Definition¶

A storage-architecture decision: the authoritative durability tier for a VM's disk is an S3-compatible object store, not local NVMe. Local NVMe may still sit on the hot read path (as a cache), but losing a worker's NVMe does not lose user data — because the bytes were never there authoritatively. The object store is the root of durability; local storage is a performance accelerator.

Canonical wiki statement¶

Fly.io Sprites, 2026-01-14:

"Every Sprite comes with 100GB of durable storage. We're able to do that because the root of storage is S3-compatible object storage. […] Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it's a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words 'Sprites are backed by object storage.'"

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

And the orchestration consequence:

"In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially."

Implications¶

1. Workload is not anchored to a physical¶

Fly Volumes' canonical shape is "attached to a specific worker physical". A Machine with a Volume cannot be trivially moved — this broke Fly's drain playbook for three years until the 2024 async block-clone migration shipped.

With object-store-rooted disks, the Machine can be re-started on any worker; the new worker pulls chunks from the object store on demand (through the [[concepts/read-through-nvme- cache|local cache]]) and rebuilds hot state on first touch.

"Before we did Fly Volumes, that was as simple as pushing a 'drain' button on a server. Imagine losing a capability like that."

2. Physical failure is cache invalidation, not data loss¶

Local NVMe becoming unreachable (a dead worker, a corrupted filesystem, a disk swap) is a cache-miss event, not a data-loss event. The authoritative bytes are in the object store.

"Attached storage is fast, but can lose data — if a physical blows up, there's no magic what rescues its stored bits. You're stuck with our last snapshot backup. That's fine for a replicated Postgres! It's part of what Postgres replication is for. But for anything without explicit replication, it's a very sharp edge."

3. Durable state reduces to a URL¶

See concepts/durable-state-as-url. Moving a workload is moving a URL reference; the bytes don't move.

4. Cheap snapshots / forks / checkpoints¶

Because chunks are immutable and content-addressed in the object store, a snapshot is a metadata operation. See concepts/fast-checkpoint-via-metadata-shuffle.

Trade-offs Fly explicitly names¶

Fly flags this as "not a pure win" for Fly Machines — the argument for keeping local-NVMe-as-root for some workloads:

"We've always wanted to run Fly Machine disks off object storage (we have an obscure LSVD feature that does this), but the performance isn't adequate for a hot Postgres node in production."

Write latency to an object store (even regional, even hot-path- cached) is orders of magnitude higher than local NVMe. For workloads that commit at every write (Postgres WAL fsync, OLTP flushing), object-store-rooted disks aren't yet competitive.

The NVMe cache is sparse and disposable¶

A Sprite has a sparse 100 GB NVMe volume attached — sparse because chunk materialisation is lazy. "Nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store." Cache eviction / loss / worker-swap means "refetch from object store", not "data corruption" or "restore-from-snapshot".

See concepts/read-through-nvme-cache.

JuiceFS — POSIX FS over object-store chunks + transactional metadata DB. Sprites' storage stack is a JuiceFS fork.
LSVD — Fly.io's 2023 experimental "bottomless S3-backed volumes". Same premise at the block- device level. Named by the Sprites post as the Fly-Machine-side experiment whose perf ceiling forced a different shape for Sprites.
Tigris — regional S3-compat store. Likely (though not named) the chunk backend.
AWS S3's "Zonal Express One" (not discussed in this post, but same architectural direction — object storage collapsing the block/object distinction on the hot path).

Contrast axis¶

Axis	Local NVMe as root (Fly Volumes)	Object store as root (Sprites)
Durability	Worker-local; snapshot-backed	S3-grade 11-nines
Migration	Expensive (3 years to ship)	Trivial — re-point a URL
Worker failure	Data loss unless replicated	Cache invalidation
Write latency	NVMe-grade	Object-store-grade (slower)
Hot read latency	NVMe-grade	NVMe-grade (cache)
Fits OLTP DBs	Yes	No (performance inadequate)
Fits agent workloads	Unnecessarily anchored	Natural fit

Seen in¶

[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement; "Object stores are the Internet's Hoover Dams."

systems/fly-sprites
systems/fly-volumes — contrast case.
systems/lsvd — Fly-Machine-side block-level precedent.
systems/aws-s3 — S3-compatibility is the only stated constraint on the backend.
systems/tigris — likely-but-unnamed substrate.
systems/juicefs — filesystem shape on top.
concepts/bus-hop-storage-tradeoff — the framing this concept reopens.
concepts/durable-state-as-url — the orchestration-level consequence.
concepts/read-through-nvme-cache — the performance- preserving cache layer.
concepts/metadata-data-split-storage
patterns/metadata-plus-chunk-storage-stack
patterns/read-through-object-store-volume
companies/flyio