Skip to content

CONCEPT Cited by 3 sources

Fast VM boot DX

Definition

The developer-experience property that a VM primitive can be treated like a container or a function: started on demand per request / per call / per session, measured in milliseconds. When the VM boot budget is a multi-second thing, the product shapes that become feasible change — request-scoped isolation, per-tenant ephemeral sandboxes, scale-to-zero, disposable-VM agentic loops, on-demand parallel compute — all become operationally possible or operationally impossible based on whether boot is ms or seconds.

Canonical wiki statement

Fly.io, 2025-02-14:

We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.

(Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Boot latency is treated as a first-class product requirement, not an engineering-optimisation target.

The two regimes

Fast-boot (ms)

Slow-boot (seconds)

  • QEMU — seconds to boot a general-purpose VM.
  • VMware — similar scale.
  • EC2 t3.micro / GCE e2-micro — seconds to tens of seconds for a general-purpose cloud VM.
  • Product shapes enabled:
  • Long-lived instances (web servers, databases, persistent workloads).
  • Pre-warmed pools with scheduler-maintained capacity.
  • Per-tenant dedicated environments (DevOps-provisioned, days-to-weeks life).

The axis isn't a continuum — it's two regimes with almost no overlap in the product shapes they can host.

Why the regime matters

  • Latency composes multiplicatively with parallelism. A 64-node GPU cluster booted from an image takes max(per-node-boot) to come up; Firecracker-class boot makes this a seconds-class thing (Fly.io's Livebook + FLAME pitch). Seconds-class boot per node makes it a minutes-class thing. Minutes-class boot makes it unfit for interactive / notebook workloads.
  • Scale-to-zero economics. If cold-start is ms, you can de-allocate instances between requests and the user never notices. If cold-start is seconds, you have to keep warm pools, which is scale-to-N.
  • Sandbox economics. Per-invocation VM isolation (Lambda's architectural commit) is viable because Firecracker boots in ~125 ms. Per-session or per-request VM isolation at QEMU speeds requires pre-warmed pools or extension-request-scoped reuse, both of which weaken the isolation posture.
  • Agentic-loop DX. A closed-loop LLM workflow (concepts/agentic-development-loop) wants to reset the execution sandbox between attempts. ms-boot makes this cheap; seconds-boot makes it slow enough to discourage reset, biasing the loop toward state-reuse.

Cost of deviating from on-vendor happy paths to keep fast-boot

Fly.io's 2025-02-14 retrospective is the wiki's cleanest case study. The Nvidia driver happy path sits on QEMU / VMware. Both cost Fly.io's DX requirement. Fly chose Cloud Hypervisor off-path, ate months of failed driver-integration work, and eventually scaled back the GPU product rather than regress to QEMU. The DX constraint was stronger than the driver-compatibility constraint.

Other instances

Caveats

  • "DX" is doing load-bearing work here — this concept is phrased from the platform-operator's vantage, but the downstream user ("developer") is whose experience matters. The platform can have ms-boot and still ship a bad DX through other surfaces (API latency, scheduler behaviour, error-handling) — ms-boot is necessary, not sufficient.
  • Boot time is one axis, not the only one. Memory allocation / image pull / filesystem setup / network setup all compose. Firecracker's 125 ms is the VMM-start; the user-visible "Lambda invocation cold start" is a composite.
  • Firecracker's boot is fast because the guest kernel is small and the device model is minimal. If the guest needs a full Linux distro + systemd, the boot advantage shrinks.
  • Not every product wants ms-boot. A database doesn't care — it lives for weeks. The concept matters for ephemeral-or-scale-to-zero shapes.

Seen in (wiki)

Last updated · 200 distilled / 1,178 read