CONCEPT Cited by 1 source
Durable state as URL¶
Definition¶
A workload-state design in which the authoritative, durable identity of a workload's state is a URL (or a small handle that resolves to a URL) into an object store. Starting the workload on a different physical host is a matter of pointing a fresh VM at the same URL — bytes don't move; a pointer does. Migration, failover, and fleet-drain reduce to re-parenting a URL.
Canonical wiki statement¶
Fly.io Sprites, 2026-01-14:
"In a real sense, the durable state of a Sprite is simply a URL. Wherever he lays his hat is his home! They migrate (or recover from failed physicals) trivially. It's early days for our internal tooling, but we have so many new degrees of freedom to work with."
(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])
Implication¶
The physical worker holding a Sprite stops mattering. What matters is the URL into the object-store-rooted storage stack. The local NVMe is a cache ([[concepts/read-through-nvme- cache]]); cache-loss ≠ data-loss. Migration is "start a fresh VM on any worker, point it at the URL".
The regained capability: drain¶
Fly.io's fleet-operations history shows this concept's value most clearly on the drain axis. From the 2024 Making Machines Move retrospective:
Before Fly Volumes, draining a worker was a button push: every workload could be scheduled elsewhere trivially. Volumes broke that — attached-NVMe storage anchored workloads to specific physicals. Three years of engineering produced [[patterns/async-block-clone-for-stateful- migration|async block-clone migration]] to restore drain as an operation.
Ptacek, in the Sprites post:
"Worse, from our perspective, is that attached storage anchors workloads to specific physicals. We have lots of reasons to want to move Fly Machines around. Before we did Fly Volumes, that was as simple as pushing a 'drain' button on a server. Imagine losing a capability like that. It took 3 years to get workload migration right with attached storage, and it's still not 'easy'."
Sprites' object-store-rooted disks restore the pre-Volume drain property without giving up the durability that Volumes shipped.
Why "URL" specifically¶
The URL formulation is load-bearing:
- Location-transparent. Different workers resolve the same URL to the same bytes.
- Transferable. The URL is small, shareable, and doesn't require a byte copy to move.
- Testable / loggable / snapshottable. A URL is a first- class identifier; reasoning about "where is this workload's state?" becomes "what URL is bound to this VM's storage stack?".
- Uniform substrate. Object stores are the Internet's "infrastructure megaprojects" — widely deployed, cheap, globally available. A URL into one is a durable primitive almost everywhere.
Relation to the "cloud-object-is-pointer" pattern¶
A number of modern platforms adopt this shape at different granularities:
- Sprites — the disk is a URL.
- Fly Tigris customer blobs — an individual object is a URL; metadata tracks replicas.
- AWS S3 object ARNs — any S3 object is a globally addressable URL.
- Git repositories — a repo URL is the durable handle; working copies are caches.
- Container registries — a manifest URL is the durable handle; local image stores are caches.
The novel piece in Sprites is granularity — the VM's root disk, not individual artefacts, is the URL.
Degrees of freedom opened up¶
Ptacek: "We have so many new degrees of freedom to work with." Unstated in the post but implied by the shape:
- Migrate VMs between workers without copying disk.
- Failover a VM to a different worker on physical failure without restore-from-backup.
- Clone / fork a VM by duplicating its metadata URL.
- Share a URL across hosts for read-only inspection.
- Keep many stopped VMs alive cheaply — stopped VM = URL bound to nothing running.
The Sprites post does not fully exploit these yet ("It's early days for our internal tooling") but lists them as the design's upside.
Caveats¶
- "A URL" hides the metadata DB. The full durable-state picture is: a URL to the chunk-store prefix plus a durable copy of the metadata DB (Litestream-backed SQLite in the Sprites case). "State is a URL" is a gist; the literal substrate is "state is a URL + the metadata DB replicated to object storage".
- Ownership / auth. A URL-identified state needs access- control. The Sprites post doesn't discuss auth-token flow; presumably the orchestration code (and only the orchestration code) holds the credentials.
- Performance cost on cold worker. First touches of chunks on a new worker pay the object-store-GET latency until the NVMe cache is warm.
- Write-latency ceiling. A URL-anchored disk is unsuitable for OLTP-grade workloads (Ptacek names Postgres as the counterexample: "the performance isn't adequate for a hot Postgres node in production").
Seen in¶
- [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement.
Related¶
- systems/fly-sprites
- systems/fly-volumes — contrast case.
- systems/aws-s3 — URL-addressed substrate.
- concepts/object-storage-as-disk-root — the architectural precondition.
- concepts/fleet-drain-operation — the canonical operation durable-state-as-URL restores.
- concepts/read-through-nvme-cache — the hot-path performance-preserving layer.
- patterns/async-block-clone-for-stateful-migration — the 3-year-to-ship workaround this concept obviates.
- patterns/read-through-object-store-volume
- companies/flyio