CONCEPT Cited by 1 source
Object storage as disk root¶
Definition¶
A storage-architecture decision: the authoritative durability tier for a VM's disk is an S3-compatible object store, not local NVMe. Local NVMe may still sit on the hot read path (as a cache), but losing a worker's NVMe does not lose user data — because the bytes were never there authoritatively. The object store is the root of durability; local storage is a performance accelerator.
Canonical wiki statement¶
Fly.io Sprites, 2026-01-14:
"Every Sprite comes with 100GB of durable storage. We're able to do that because the root of storage is S3-compatible object storage. […] Sprites jettison this model. We still exploit NVMe, but not as the root of storage. Instead, it's a read-through cache for a blob on object storage. S3-compatible object stores are the most trustworthy storage technology we have. I can feel my blood pressure dropping just typing the words 'Sprites are backed by object storage.'"
(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])
And the orchestration consequence:
"In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially."
Implications¶
1. Workload is not anchored to a physical¶
Fly Volumes' canonical shape is "attached to a specific worker physical". A Machine with a Volume cannot be trivially moved — this broke Fly's drain playbook for three years until the 2024 async block-clone migration shipped.
With object-store-rooted disks, the Machine can be re-started on any worker; the new worker pulls chunks from the object store on demand (through the [[concepts/read-through-nvme- cache|local cache]]) and rebuilds hot state on first touch.
"Before we did Fly Volumes, that was as simple as pushing a 'drain' button on a server. Imagine losing a capability like that."
2. Physical failure is cache invalidation, not data loss¶
Local NVMe becoming unreachable (a dead worker, a corrupted filesystem, a disk swap) is a cache-miss event, not a data-loss event. The authoritative bytes are in the object store.
"Attached storage is fast, but can lose data — if a physical blows up, there's no magic what rescues its stored bits. You're stuck with our last snapshot backup. That's fine for a replicated Postgres! It's part of what Postgres replication is for. But for anything without explicit replication, it's a very sharp edge."
3. Durable state reduces to a URL¶
See concepts/durable-state-as-url. Moving a workload is moving a URL reference; the bytes don't move.
4. Cheap snapshots / forks / checkpoints¶
Because chunks are immutable and content-addressed in the object store, a snapshot is a metadata operation. See concepts/fast-checkpoint-via-metadata-shuffle.
Trade-offs Fly explicitly names¶
Fly flags this as "not a pure win" for Fly Machines — the argument for keeping local-NVMe-as-root for some workloads:
"We've always wanted to run Fly Machine disks off object storage (we have an obscure LSVD feature that does this), but the performance isn't adequate for a hot Postgres node in production."
Write latency to an object store (even regional, even hot-path- cached) is orders of magnitude higher than local NVMe. For workloads that commit at every write (Postgres WAL fsync, OLTP flushing), object-store-rooted disks aren't yet competitive.
The NVMe cache is sparse and disposable¶
A Sprite has a sparse 100 GB NVMe volume attached — sparse because chunk materialisation is lazy. "Nothing in that NVMe volume should matter; stored chunks are immutable and their true state lives on the object store." Cache eviction / loss / worker-swap means "refetch from object store", not "data corruption" or "restore-from-snapshot".
See concepts/read-through-nvme-cache.
Lineage / related systems¶
- JuiceFS — POSIX FS over object-store chunks + transactional metadata DB. Sprites' storage stack is a JuiceFS fork.
- LSVD — Fly.io's 2023 experimental "bottomless S3-backed volumes". Same premise at the block- device level. Named by the Sprites post as the Fly-Machine-side experiment whose perf ceiling forced a different shape for Sprites.
- Tigris — regional S3-compat store. Likely (though not named) the chunk backend.
- AWS S3's "Zonal Express One" (not discussed in this post, but same architectural direction — object storage collapsing the block/object distinction on the hot path).
Contrast axis¶
| Axis | Local NVMe as root (Fly Volumes) | Object store as root (Sprites) |
|---|---|---|
| Durability | Worker-local; snapshot-backed | S3-grade 11-nines |
| Migration | Expensive (3 years to ship) | Trivial — re-point a URL |
| Worker failure | Data loss unless replicated | Cache invalidation |
| Write latency | NVMe-grade | Object-store-grade (slower) |
| Hot read latency | NVMe-grade | NVMe-grade (cache) |
| Fits OLTP DBs | Yes | No (performance inadequate) |
| Fits agent workloads | Unnecessarily anchored | Natural fit |
Seen in¶
- [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement; "Object stores are the Internet's Hoover Dams."
Related¶
- systems/fly-sprites
- systems/fly-volumes — contrast case.
- systems/lsvd — Fly-Machine-side block-level precedent.
- systems/aws-s3 — S3-compatibility is the only stated constraint on the backend.
- systems/tigris — likely-but-unnamed substrate.
- systems/juicefs — filesystem shape on top.
- concepts/bus-hop-storage-tradeoff — the framing this concept reopens.
- concepts/durable-state-as-url — the orchestration-level consequence.
- concepts/read-through-nvme-cache — the performance- preserving cache layer.
- concepts/metadata-data-split-storage
- patterns/metadata-plus-chunk-storage-stack
- patterns/read-through-object-store-volume
- companies/flyio