Skip to content

CONCEPT Cited by 5 sources

Fast VM boot DX

Definition

The developer-experience property that a VM primitive can be treated like a container or a function: started on demand per request / per call / per session, measured in milliseconds. When the VM boot budget is a multi-second thing, the product shapes that become feasible change — request-scoped isolation, per-tenant ephemeral sandboxes, scale-to-zero, disposable-VM agentic loops, on-demand parallel compute — all become operationally possible or operationally impossible based on whether boot is ms or seconds.

Canonical wiki statement

Fly.io, 2025-02-14:

We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.

(Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Boot latency is treated as a first-class product requirement, not an engineering-optimisation target.

The two regimes

Fast-boot (ms)

Slow-boot (seconds)

  • QEMU — seconds to boot a general-purpose VM.
  • VMware — similar scale.
  • EC2 t3.micro / GCE e2-micro — seconds to tens of seconds for a general-purpose cloud VM.
  • Product shapes enabled:
  • Long-lived instances (web servers, databases, persistent workloads).
  • Pre-warmed pools with scheduler-maintained capacity.
  • Per-tenant dedicated environments (DevOps-provisioned, days-to-weeks life).

The axis isn't a continuum — it's two regimes with almost no overlap in the product shapes they can host.

Why the regime matters

  • Latency composes multiplicatively with parallelism. A 64-node GPU cluster booted from an image takes max(per-node-boot) to come up; Firecracker-class boot makes this a seconds-class thing (Fly.io's Livebook + FLAME pitch). Seconds-class boot per node makes it a minutes-class thing. Minutes-class boot makes it unfit for interactive / notebook workloads.
  • Scale-to-zero economics. If cold-start is ms, you can de-allocate instances between requests and the user never notices. If cold-start is seconds, you have to keep warm pools, which is scale-to-N.
  • Sandbox economics. Per-invocation VM isolation (Lambda's architectural commit) is viable because Firecracker boots in ~125 ms. Per-session or per-request VM isolation at QEMU speeds requires pre-warmed pools or extension-request-scoped reuse, both of which weaken the isolation posture.
  • Agentic-loop DX. A closed-loop LLM workflow (concepts/agentic-development-loop) wants to reset the execution sandbox between attempts. ms-boot makes this cheap; seconds-boot makes it slow enough to discourage reset, biasing the loop toward state-reuse.

Cost of deviating from on-vendor happy paths to keep fast-boot

Fly.io's 2025-02-14 retrospective is the wiki's cleanest case study. The Nvidia driver happy path sits on QEMU / VMware. Both cost Fly.io's DX requirement. Fly chose Cloud Hypervisor off-path, ate months of failed driver-integration work, and eventually scaled back the GPU product rather than regress to QEMU. The DX constraint was stronger than the driver-compatibility constraint.

Other instances

Caveats

  • "DX" is doing load-bearing work here — this concept is phrased from the platform-operator's vantage, but the downstream user ("developer") is whose experience matters. The platform can have ms-boot and still ship a bad DX through other surfaces (API latency, scheduler behaviour, error-handling) — ms-boot is necessary, not sufficient.
  • Boot time is one axis, not the only one. Memory allocation / image pull / filesystem setup / network setup all compose. Firecracker's 125 ms is the VMM-start; the user-visible "Lambda invocation cold start" is a composite.
  • Firecracker's boot is fast because the guest kernel is small and the device model is minimal. If the guest needs a full Linux distro + systemd, the boot advantage shrinks.
  • Not every product wants ms-boot. A database doesn't care — it lives for weeks. The concept matters for ephemeral-or-scale-to-zero shapes.

Seen in (wiki)

  • sources/2025-02-14-flyio-we-were-wrong-about-gpus — Fly.io's explicit "millisecond boot is the whole point" framing.
  • [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — Sprites' 1-2 second create is delivered not by fast cold-boot but by a warm-pool implementation arm of the fast-boot DX promise. Ptacek: "Every physical worker knows exactly what container the next Sprite is going to start with, so it's easy for us to keep pools of 'empty' Sprites standing by. The result: a Sprite create doesn't have any heavy lifting to do; it's basically just doing the stuff we do when we start a Fly Machine." Fast-VM-boot-DX can be realised at the create level via warm pools when the product shape allows a uniform base image (see concepts/no-container-image-sprite, patterns/warm-pool-zero-create-path).
  • sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — AWS Lambda's architectural commit to Firecracker for per-invocation isolation at fast-boot.
  • sources/2026-04-21-figma-server-side-sandboxing-virtual-machines — concrete example of Firecracker boot being both fast-enough (for async) and too-slow (for sync) on the same platform.
  • sources/2026-01-09-flyio-code-and-let-live — Fly.io's 2026-01-09 Sprites launch composes fast-boot-DX with durability: 1-2s create latency and indefinite lifetime and ~1s checkpoint restore. The wiki's earlier fast-boot-DX framing implicitly paired ms-boot with ephemeral-lifecycle (Lambda, disposable-VM agent loops); Sprites show the two axes are independent. Same ms-boot primitive, different lifecycle choice.
Last updated · 542 distilled / 1,571 read