CONCEPT Cited by 5 sources

Fast VM boot DX¶

Definition¶

The developer-experience property that a VM primitive can be treated like a container or a function: started on demand per request / per call / per session, measured in milliseconds. When the VM boot budget is a multi-second thing, the product shapes that become feasible change — request-scoped isolation, per-tenant ephemeral sandboxes, scale-to-zero, disposable-VM agentic loops, on-demand parallel compute — all become operationally possible or operationally impossible based on whether boot is ms or seconds.

Canonical wiki statement¶

Fly.io, 2025-02-14:

We like QEMU fine, and could have talked ourselves into a security story for it, but the whole point of Fly Machines is that they take milliseconds to start. We could not have offered our desired Developer Experience on the Nvidia happy-path.

(Source: sources/2025-02-14-flyio-we-were-wrong-about-gpus)

Boot latency is treated as a first-class product requirement, not an engineering-optimisation target.

The two regimes¶

Fast-boot (ms)¶

Firecracker — ~125 ms cold boot, designed for AWS Lambda per-invocation isolation.
Intel Cloud Hypervisor — similar micro-VM posture, additionally supports PCI passthrough.
Product shapes enabled:
Per-request VM isolation (AWS Lambda).
Scale-to-zero with sub-second cold start (concepts/scale-to-zero).
Per-session disposable VMs for agentic loops (patterns/disposable-vm-for-agentic-loop).
64-node GPU cluster "in seconds rather than minutes" (concepts/seconds-scale-gpu-cluster-boot).
Per-invocation link-preview / image-processing sandboxes (Figma's Lambda use, 2026-04-21).

Slow-boot (seconds)¶

QEMU — seconds to boot a general-purpose VM.
VMware — similar scale.
EC2 t3.micro / GCE e2-micro — seconds to tens of seconds for a general-purpose cloud VM.
Product shapes enabled:
Long-lived instances (web servers, databases, persistent workloads).
Pre-warmed pools with scheduler-maintained capacity.
Per-tenant dedicated environments (DevOps-provisioned, days-to-weeks life).

The axis isn't a continuum — it's two regimes with almost no overlap in the product shapes they can host.

Why the regime matters¶

Latency composes multiplicatively with parallelism. A 64-node GPU cluster booted from an image takes max(per-node-boot) to come up; Firecracker-class boot makes this a seconds-class thing (Fly.io's Livebook + FLAME pitch). Seconds-class boot per node makes it a minutes-class thing. Minutes-class boot makes it unfit for interactive / notebook workloads.
Scale-to-zero economics. If cold-start is ms, you can de-allocate instances between requests and the user never notices. If cold-start is seconds, you have to keep warm pools, which is scale-to-N.
Sandbox economics. Per-invocation VM isolation (Lambda's architectural commit) is viable because Firecracker boots in ~125 ms. Per-session or per-request VM isolation at QEMU speeds requires pre-warmed pools or extension-request-scoped reuse, both of which weaken the isolation posture.
Agentic-loop DX. A closed-loop LLM workflow (concepts/agentic-development-loop) wants to reset the execution sandbox between attempts. ms-boot makes this cheap; seconds-boot makes it slow enough to discourage reset, biasing the loop toward state-reuse.

Cost of deviating from on-vendor happy paths to keep fast-boot¶

Fly.io's 2025-02-14 retrospective is the wiki's cleanest case study. The Nvidia driver happy path sits on QEMU / VMware. Both cost Fly.io's DX requirement. Fly chose Cloud Hypervisor off-path, ate months of failed driver-integration work, and eventually scaled back the GPU product rather than regress to QEMU. The DX constraint was stronger than the driver-compatibility constraint.

Other instances¶

AWS Lambda (2014–present). Started with single-tenant EC2 instances; transitioned to Firecracker micro-VMs in part so per-invocation VM isolation would be economically viable at ms-boot. See sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years.
Figma server-side sandboxing. Figma uses AWS Lambda (backed by Firecracker) for stateless fetch-and-process workloads. Even so, Firecracker's boot cost is "still too high to pay on many core workflows" — Lambda VMs get reused within a tenant for synchronous workloads. Even ms-boot isn't always fast enough. See sources/2026-04-21-figma-server-side-sandboxing-virtual-machines.

Caveats¶

"DX" is doing load-bearing work here — this concept is phrased from the platform-operator's vantage, but the downstream user ("developer") is whose experience matters. The platform can have ms-boot and still ship a bad DX through other surfaces (API latency, scheduler behaviour, error-handling) — ms-boot is necessary, not sufficient.
Boot time is one axis, not the only one. Memory allocation / image pull / filesystem setup / network setup all compose. Firecracker's 125 ms is the VMM-start; the user-visible "Lambda invocation cold start" is a composite.
Firecracker's boot is fast because the guest kernel is small and the device model is minimal. If the guest needs a full Linux distro + systemd, the boot advantage shrinks.
Not every product wants ms-boot. A database doesn't care — it lives for weeks. The concept matters for ephemeral-or-scale-to-zero shapes.

Seen in (wiki)¶

sources/2025-02-14-flyio-we-were-wrong-about-gpus — Fly.io's explicit "millisecond boot is the whole point" framing.
[[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — Sprites' 1-2 second create is delivered not by fast cold-boot but by a warm-pool implementation arm of the fast-boot DX promise. Ptacek: "Every physical worker knows exactly what container the next Sprite is going to start with, so it's easy for us to keep pools of 'empty' Sprites standing by. The result: a Sprite create doesn't have any heavy lifting to do; it's basically just doing the stuff we do when we start a Fly Machine." Fast-VM-boot-DX can be realised at the create level via warm pools when the product shape allows a uniform base image (see concepts/no-container-image-sprite, patterns/warm-pool-zero-create-path).
sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — AWS Lambda's architectural commit to Firecracker for per-invocation isolation at fast-boot.
sources/2026-04-21-figma-server-side-sandboxing-virtual-machines — concrete example of Firecracker boot being both fast-enough (for async) and too-slow (for sync) on the same platform.
sources/2026-01-09-flyio-code-and-let-live — Fly.io's 2026-01-09 Sprites launch composes fast-boot-DX with durability: 1-2s create latency and indefinite lifetime and ~1s checkpoint restore. The wiki's earlier fast-boot-DX framing implicitly paired ms-boot with ephemeral-lifecycle (Lambda, disposable-VM agent loops); Sprites show the two axes are independent. Same ms-boot primitive, different lifecycle choice.

concepts/nvidia-driver-happy-path — the supplier-side constraint that Fast-VM-boot-DX forced Fly.io off.
concepts/micro-vm-isolation — the isolation half of the trade-off.
concepts/cold-start — the request-first-to-hit-a-new-VM latency that fast-boot optimises.
concepts/scale-to-zero — the economic pattern fast-boot enables.
systems/firecracker / systems/intel-cloud-hypervisor — fast-boot instances.
systems/qemu — slow-boot comparison.
systems/fly-machines — the product whose DX promise this concept underwrites.
systems/fly-sprites — fast-boot-DX composed with durability instead of ephemerality.
concepts/durable-vs-ephemeral-sandbox — the orthogonal axis fast-boot-DX composes with.
concepts/first-class-checkpoint-restore — the durable-side property that fast-boot makes casual.
companies/flyio — canonical wiki statement.