Skip to content

FLYIO 2026-01-09 Tier 3

Read original ↗

Fly.io — Code And Let Live

Summary

Thomas Ptacek's 2026-01-09 product-launch-plus-manifesto announcing Sprites — Fly.io's new coding-agent substrate — and arguing that ephemeral read-only sandboxes for coding agents are obsolete. The opening demo runs sprite create, drops into a root shell on a brand-new Linux VM "in about the same amount of time it would take to ssh into a host that already existed", apt-get installs FFmpeg, calls sprite-env checkpoints create ("completes instantly"), walks away for days, comes back, runs sprite console, and finds FFmpeg still there — "Sprites are durable. 100GB capacity to start, no ceremony. Maybe I'll keep it around a few more days, maybe a few months, doesn't matter, just works." The core thesis: "Claude Doesn't Want A Stateless Container." The stateless-container abstraction was built for professional developers who need simplicity, scale-out, and failure-blast- radius reduction; agents are not pro developers and want actual computers — durable storage that survives across tasks, "a thing that doesn't necessarily vanish after a single job is completed." Two supporting arguments: (1) "Simple Wins" — without a durable computer, the industry is spending "tens of millions of dollars figuring out how to snapshot and restore ephemeral sandboxes" (rebuild node_modules, re-provision S3 / Redis / RDS outside the sandbox because you can't trust a file to stay put, "round-trip state through 'plan files' which are ostensibly prose but often really just egregiously-encoded key-value stores") — all of which "is unnecessary. Instead of figuring them out, just use an actual computer." (2) "Galaxy Brain Win" — software development is changing; for personal apps (Ptacek's vibe-coded kid-MDM on a Sprite, month-long uptime, APNS certs, Anycast URL) "dev is prod, prod is dev", most useful apps will never need million-user scale. Closing call: "Fuck Ephemeral Sandboxes. […] The age of sandboxes is over. The time of the disposable computer has come." Sprites' distinguishing properties enumerated: (a) "casually create hundreds of them (without needing a Docker container), each appearing in 1-2 seconds"; (b) "go idle and stop metering automatically, so it's cheap to have lots of them. I use dozens"; (c) "hooked up to our Anycast network, so I can get an HTTPS URL"; (d) "fully durable. They don't die until I tell them to." First-class checkpoint / restore"like git, but for the whole system" — with ~1s restore latency, "fast enough to use casually, interactively. Not an escape hatch. Rather: an intended part of the ordinary course of using a Sprite." Substrate is explicitly not Fly Machines — "They have an entirely new storage stack. They're orchestrated differently. No Dockerfiles" — a follow-up post ("we wrote another 1000 words about how they work") is promised. Self- aware product posture: "Obviously, I'm trying to sell you something here. But that doesn't make me wrong. The argument I'm making is the reason we built the specific thing I'm selling."

Key takeaways

  1. Ephemeral read-only sandboxes are the wrong abstraction for coding agents. The stateless-container model was designed for "professional software developers […] trained to build stateless instances" where persistent state lives in external databases — a shape that "buys you simplicity, flexible scale-out, and reduced failure blast radius." Coding agents ("a hyper-productive five-year-old savant […] works best when you find a way to let it zap itself") want computers — not-vanishing-after-one-job + durable storage — not "sandboxes". Canonical statement of concepts/durable-vs-ephemeral-sandbox (Source: sources/2026-01-09-flyio-code-and-let-live).
  2. Ephemerality forces the industry to rebuild durable- computer primitives poorly. Without persistent storage: node_modules has to rebuild every session ("the industry is spending tens of millions of dollars figuring out how to snapshot and restore ephemeral sandboxes"); realistic data lives in out-of-sandbox S3 / Redis / RDS ("they're building infrastructure to work around the fact that they can't just write a file and trust it to stay put. Gross"); state round-trips through "'plan files' which are ostensibly prose but often really just egregiously-encoded key-value stores". "I'm saying they're unnecessary. Instead of figuring them out, just use an actual computer."
  3. Durable ≠ immortal: first-class checkpoint + restore makes destructive-mistake recovery the ordinary flow. Ptacek's demo: "Maybe an ill-advised global pip3 install. Or rm -rf $HMOE/bin. Or dd if=/dev/random of=/dev/vdb. Whatever it was, everything's broken. So: sprite checkpoint restore v1 […] Restore took about one second. […] Not an escape hatch. Rather: an intended part of the ordinary course of using a Sprite. Like git, but for the whole system." Canonical concepts/first-class-checkpoint-restore — the key property that distinguishes durable-with-rollback from durable-and-fragile (Source: sources/2026-01-09-flyio-code-and-let-live).
  4. Time-limited ephemeral sandboxes can't host compute-heavy or network-heavy agent workloads. "Most things agents do today don't take much time; in fact, they're often limited only by the rate at which frontier models can crunch tokens. Test suites run quickly. The 99th percentile sandboxed agent run probably needs less than 15 minutes. But there are feature requests where compute and network time swamp token crunching. I built the documentation site for the Sprites API by having a Claude Sprite interact with the code and our API, building and testing examples for the API one at a time. There are APIs where the client interaction time alone would blow sandbox budgets." Canonical motivation for moving past the 15-minute-sandbox design target (Source: sources/2026-01-09-flyio-code-and-let-live).
  5. Sprites target dozens-per-user, not one-per-task. "I can casually create hundreds of them (without needing a Docker container), each appearing in 1-2 seconds. They go idle and stop metering automatically, so it's cheap to have lots of them. I use dozens." The product shape combines fast-VM-boot DX with concepts/scale-to-zero with durability — fast to spin up, cheap to leave alive, rebuildable from a checkpoint.
  6. Anycast + first-class HTTPS URL per VM is a first-class feature. "Hooked up to our Anycast network, so I can get an HTTPS URL." The same Anycast substrate that backs Fly Machines also backs Sprites — MDM-registration, APNS callback, preview-URL, anything HTTP-driven works without explicit port exposure. (Ptacek ran his personal-MDM Sprite's Anycast URL as an APNS-Push- Certificate MDM-registration endpoint for his kids' devices for a month.)
  7. Personal-software thesis: "dev is prod, prod is dev". The Galaxy-Brain argument — "I have kids. They have devices. I wanted some control over them. So I did what many of you would do in my situation: I vibe-coded an MDM. […] I built this thing with Claude. It's a SQLite-backed Go application running on a Sprite. The Anycast URL my Sprite exports works as an MDM registration URL. Claude also worked out all the APNS Push Certificate drama for me. It all just works. […] I've been running this for a month now, still on a Sprite, and see no reason ever to stop. It is a piece of software that solves an important real-world problem for me. […] For this app, dev is prod, prod is dev." Extends concepts/vibe-coding from the throw-away-prototype framing to a long-running-personal-app framing. Product/scale implication: "you wouldn't want to ship an app to millions of people on a Sprite. But most apps don't want to serve millions of people. The most important day-to-day apps disproportionately won't have million-person audiences."
  8. Sprite substrate is NOT Fly Machines. "We spent years kidding ourselves. We built a platform for horizontal- scaling production applications with micro-VMs that boot so quickly that, if you hold them in exactly the right way, you can do a pretty decent code sandbox with them. But it's always been a square peg, round hole situation. […] They're related to Fly Machines but sharply different in important ways. They have an entirely new storage stack. They're orchestrated differently. No Dockerfiles." The company's self-critique: Fly Machines is optimised for horizontal- scaling stateless production apps; the Sprite abstraction needed a different orchestrator, a different storage stack, and a different image-composition story. Architectural detail is deferred to a promised follow-up ("we wrote another 1000 words about how they work, but I cut them out") (Source: sources/2026-01-09-flyio-code-and-let-live).

Architectural shape (as disclosed)

Dimension Disclosure
Substrate Fly.io-operated, not Fly Machines; "entirely new storage stack", "orchestrated differently", "no Dockerfiles"; details deferred to follow-up post.
Creation latency "each appearing in 1-2 seconds"; "about the same amount of time it would take to ssh into a host that already existed"
Default storage 100 GB
Idle behaviour "Go idle and stop metering automatically"concepts/scale-to-zero but with the VM intact and reachable-on-wakeup
Durability "Fully durable. They don't die until I tell them to" — weeks, months, indefinite
Network Fly Anycast; HTTPS URL per Sprite
Checkpoint creation "sprite-env checkpoints create completes instantly""didn't even bother to measure"
Checkpoint restore "about one second", "fast enough to use casually, interactively. Not an escape hatch"
Distribution shape sprite create CLI; sprite console attaches root shell; sprites.dev hosted signup
Image composition No Dockerfile; initial contents implicit; checkpoints carry incremental state forward
Agent-first use case Claude + other coding agents running as tenants of the Sprite with root shell; per-user scale of "dozens" claimed
Intended workload ceiling Not million-user production apps — "you wouldn't want to ship an app to millions of people on a Sprite"

Systems / concepts / patterns introduced

Systems

  • systems/fly-spritesnew. Durable, checkpointable, Anycast-addressed per-user micro-VMs for coding agents (and small personal apps). ~1-2s create, ~1s restore, 100 GB default, idles to no-meter but doesn't die. Distinct substrate from systems/fly-machines (new storage stack, new orchestrator, no Dockerfile).

Concepts

  • concepts/durable-vs-ephemeral-sandboxnew. The axis Ptacek's thesis turns on: read-only ephemeral sandbox vs. durable checkpointable computer as the host for a coding-agent loop. Which end of the axis fits which workload; why the ephemeral end is "unnecessary" for agents.
  • concepts/first-class-checkpoint-restorenew. The property that distinguishes durable-with-rollback from durable-and-fragile. Characterised by (a) cheap to create ("completes instantly"), (b) cheap to restore (~seconds), (c) "intended part of the ordinary course" not an escape hatch — "like git, but for the whole system."

Patterns

  • patterns/durable-micro-vm-for-agentic-loopnew. The architectural alternative to patterns/disposable-vm-for-agentic-loop. Same micro-VM isolation substrate, opposite durability posture: the VM persists across agent sessions, node_modules and installed packages stay, the agent can write a file and trust it to stay put; destructive mistakes are recovered via first-class checkpoint/restore, not VM replacement.

Existing concepts / systems extended

  • concepts/agentic-development-loop — Ptacek's argument is specifically that "an agent running on an actual computer can exploit the whole lifecycle of the application" (cites Chris McCord's Phoenix.new: "The agent behind a Phoenix.new app runs on a Fly Machine where it can see the app logs from the Phoenix app it generated. When users do things that generate exceptions, Phoenix.new notices and gets to work figuring out what happened") — the three-signal loop extends when the VM doesn't reset between iterations.
  • concepts/agent-with-root-shell — Sprites continue the Phoenix.new posture (root shell co-tenancy between user and agent), now without the per-session-reset constraint.
  • concepts/scale-to-zero — Sprites' "go idle and stop metering automatically" is the scale-to-zero contour applied to a durable substrate — the VM isn't de-allocated, just de-metered; on next access it's still there without a restore.
  • concepts/anycast — fourth Fly.io deployment (after fly-proxy routing, FKS Service endpoints, and Phoenix.new preview URLs) — HTTPS URL per Sprite exposed over the same Anycast fabric.
  • concepts/vibe-coding — personal-app extension: long-running Ptacek-kids-MDM on a Sprite for a month; "dev is prod, prod is dev".
  • systems/fly-machines — Ptacek's self-critique of Fly Machines as "always been a square peg, round hole situation" for the sandbox use case — canonical disclosure that the Sprite project required a different substrate.
  • systems/phoenix-new — explicitly cited as the motivating in-house proof that agents want "to exploit the whole lifecycle of the application" (live log access) — Sprites generalise the Phoenix.new shape beyond Elixir.
  • patterns/disposable-vm-for-agentic-loop — the prior posture this post explicitly revises. The wiki's existing page grew out of sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas"a clean-slate Linux instance that spins up instantly". 11 months later Ptacek argues the clean-slate property was load-bearing only for lack of a good durable alternative; Sprites provide the alternative. The two patterns now coexist as a design choice, not a progression.

Operational numbers disclosed

  • Create latency: "1-2 seconds" per Sprite (claimed); "about the same amount of time it would take to ssh into a host that already existed" (demo).
  • Default storage: 100 GB per Sprite.
  • Checkpoint create: "completes instantly. Didn't even bother to measure."
  • Checkpoint restore: "about one second", "fast enough to use casually, interactively".
  • Ptacek personal-use scale: "I use dozens"; kid-MDM Sprite uptime "a month".
  • Per-user capacity claim: "casually create hundreds of them".
  • Workload ceiling: "you wouldn't want to ship an app to millions of people on a Sprite".

Operational numbers NOT disclosed

  • Storage per Sprite actual ceiling, scaling beyond 100 GB, I/O semantics (local vs. remote), crash consistency guarantees.
  • Checkpoint storage — where they live (per-Sprite local? Fly object storage? Tigris?), eviction policy, size per checkpoint, number-of-checkpoints-retained default.
  • Idle-to-active wake latency (claimed negligible but not numbered).
  • Meter boundaries — what triggers "stop metering" (CPU idle? Network idle? TTY idle?), re-metering policy, billing units for storage during idle.
  • Anycast egress cost for Sprite HTTPS URLs.
  • Isolation boundary — Firecracker? Different VMM? Container-with-strong-isolation? Post is silent on the kernel-level primitive.
  • Orchestrator. "Orchestrated differently" from Fly Machines — but how, what's the scheduler, what's the placement policy?
  • Image composition. "No Dockerfiles" — but what is the image recipe? Is there a base image per language? A sprite init equivalent?
  • Multi-tenancy / co-tenant agent ergonomics — is there a canonical "agent runs as tenant here" mount-point / tool-schema? The post gestures at Claude-as-Sprite-user without disclosing the integration surface.
  • Pricing"cheap to have lots" but no dollar figures.
  • Comparison-numbers vs. alternatives — no side-by-side with E2B, Modal sandboxes, Replit Agent VMs, Cloudflare Sandbox SDK, AWS Firecracker-based offerings.

Caveats

  • Product-launch voice. The post is explicit about this: "Obviously, I'm trying to sell you something here." The architecture-vs-manifesto ratio is roughly 30/70 — most of the post is the thesis argument, with the product details used as existence proof. The follow-up post ("another 1000 words about how they work") is where the architecture disclosure will land; this post is the framing.
  • "Sandbox" is used loosely. Ptacek's argument is specifically against time-bounded, read-only, single-task ephemeral sandboxes. Sprites are themselves still sandboxed in the kernel/VM-isolation sense — the argument is about persistence, not about removing the isolation boundary.
  • The Galaxy-Brain section is an editorial detour. The "vibe-coded MDM for my kids" + "dev is prod, prod is dev" + "applications that solve real problems for people will be owned by the people they solve problems for" is a personal-software thesis Ptacek flags as speculative ("I lose my team, most of whom don't believe me"). The main architectural claim stands without it; readers can take the Sprite architecture and reject the personal-software thesis, or vice versa.
  • Single-vendor claim. The disposable-VM → durable-VM re-framing is made from the perspective of a vendor with a specific durable product to sell. The argument extends to anyone building for coding agents; whether ephemeral sandboxes should be retired industry-wide vs. paired with durable alternatives is a claim the single-vendor framing can't settle. The existing patterns/disposable-vm-for-agentic-loop use cases (short-lived CI-style test runs, one-shot evaluations, security-boundary-first workloads) aren't obviously obsoleted — Ptacek implicitly addresses this with the "99th percentile sandboxed agent run probably needs less than 15 minutes" claim but doesn't argue all of them fit durable mode.
  • Workload-ceiling claim un-quantified. "You wouldn't want to ship an app to millions of people on a Sprite" — the where's the ceiling question (10s? 100s? 1000s? 10 000s? concurrent users?) is left open. Implicitly the follow-up post will cover the scaling properties the Sprite substrate gives up to buy durability + checkpointing + per-user isolation.
  • Checkpoint semantics un-specified. Is it memory + disk + network state? Is it kernel-level or filesystem-level? CRIU-based? What happens to in-flight Anycast connections on restore? What happens to in-flight file locks, in-flight DB transactions? ~1s restore for what scope of state is not pinned down.
  • Apologue anecdote fidelity. The "Sprite, noticing my inactivity, goes to sleep. I meet an old friend from high school at the coffee shop" framing is pedagogic; the "days, even. Returning later" + FFmpeg-still-there demo is the load-bearing claim, but the intermediate mechanic (what triggers sleep, what triggers wake, what the wake-up latency actually is) isn't broken out.
  • Cross-vendor positioning absent. The post doesn't position Sprites against E2B, Modal, Daytona, Runloop, Replit Agent, Cloudflare Sandbox SDK, AWS Firecracker, etc. — Ptacek's argument is framed as against ephemeral sandboxes generically, not as Sprites-vs-named- competitor-X.
  • Follow-up post deferred. "We have a lot to say about how Sprites work. […] But for now, I just want you to think about what I'm saying here." Architectural disclosures (storage stack, orchestrator, checkpoint mechanism) are explicitly out-of-scope for this post.

Relationship to existing wiki

  • Reframes, doesn't contradict, patterns/disposable-vm-for-agentic-loop. The wiki's existing pattern page documents the 2025-02-07 VSCode-SSH- bananas sketch + the 2025-06-20 Phoenix.new productisation as "clean-slate, instant, bounded blast radius". This post argues the clean-slate property was a symptom of missing durable alternatives — when durable is available, the blast-radius argument is met by checkpoint/restore instead of by VM replacement. The two patterns now sit as design choices on the same axis, not a progression. Add a ⚠️ note to the disposable-VM page cross-referencing this post without retracting the pattern — ephemeral is still the right fit for some workloads (short-lived CI- like test runs, security-first one-shot evaluations, cases where clean-slate-by-construction is the safety posture).
  • Extends concepts/fast-vm-boot-dx. Fast-boot-DX has been a Fly.io canonical framing since the 2025-02-14 GPU retrospective. Sprites show fast-boot-DX composes with durability — the two were posed as trade-offs in the earlier "ephemeral = fast-boot, durable = EC2-slow" framing; Sprites claim 1-2s boot and indefinite durability.
  • Corroborates concepts/first-class-checkpoint-restore as a separate axis from durability. You can have durable-without-checkpoint (classical EC2), ephemeral-with- snapshot (fake durability through sandbox snapshots — "tens of millions of dollars" the industry is spending), or durable-with-first-class-checkpoint (Sprites). The third combines the best of the other two.
  • Complements systems/phoenix-new and patterns/ephemeral-vm-as-cloud-ide. Phoenix.new is per-session (session-ends-VM-dies — aligned with the ephemeral-VM-as-cloud-IDE pattern); Sprites are per-user-indefinite (user keeps them for weeks/months). Both coexist in Fly.io's product lineup; this post doesn't obsolete Phoenix.new, it provides the durable-VM counterpart.
  • Generalises concepts/vibe-coding to long-running personal-use apps. Prior wiki framing treats vibe-coding as throw-away-prototyping; Ptacek's kid-MDM is a vibe-coded Go+SQLite app running in production on a Sprite for a month. The framing shifts: vibe-coding doesn't have to mean temporary code.

Source

Last updated · 319 distilled / 1,201 read