Skip to content

SYSTEM Cited by 10 sources

Firecracker

Firecracker is AWS's open-source KVM-based micro-VM monitor, used under AWS Lambda (and Fargate) to run many tenants densely on shared bare metal while preserving hardware-level isolation.

Why Lambda needed it

Lambda launched (Nov 2014) with a hard rule: "security is not negotiable — no two customers share an instance." To enforce that, each customer got single-tenant EC2 instances. This was expensive but the team "knew long-term that it was a problem we could solve." Firecracker is the system that solved it: the same multi-tenant security property via micro-VMs, but with "thousands of micro VMs onto a single bare metal instance."

(Source: sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years)

Architectural role

  • Provides hardware-virtualisation isolation per function invocation context — stronger than a container, lighter than a full VM.
  • Enables dense multi-tenant packing on bare-metal hosts, which is what makes Lambda's placement engine able to honour the scale-to-zero / per-ms billing model without idle-capacity waste. See concepts/micro-vm-isolation, concepts/scale-to-zero.
  • Underpins SnapStart (2022) — Firecracker VM snapshotting restores an initialized runtime near-instantaneously, cutting Java cold-start latency by up to 90%. See concepts/cold-start.
  • Co-evolves with the on-demand container loading work (Marc Brooker, USENIX ATC '23) that lets Lambda pull 10 GB container images without cold-start blowup.

Seen in

  • sources/2024-11-15-allthingsdistributed-aws-lambda-prfaq-after-10-years — referenced as the isolation-density evolution from launch single-tenant EC2 to today's multi-tenant micro-VM fleet; enables SnapStart.
  • sources/2026-04-21-figma-server-side-sandboxing-virtual-machines — production tenant example: Figma uses AWS Lambda (backed by Firecracker) as its VM-grade sandbox for stateless fetch-and-process workloads (link-preview metadata / canvas image fetch via ImageMagick), deliberately placed outside the production VPC with no IAM pivot into Figma internals (patterns/minimize-vm-permissions). Also names Firecracker's boot overhead as the reason AWS reuses Lambda VMs within a single tenant for synchronous workloads — "Firecracker offers reasonably quick VM boot times, but the overheads are still too high to pay on many core workflows" — a concrete latency/isolation trade-off at the Lambda-customer level.
  • sources/2024-03-07-flyio-fly-kubernetes-does-more-now — Firecracker as the Pod substrate under Fly Kubernetes: every K8s Pod is a Fly Machine (Firecracker micro-VM), orchestrated by flyd rather than containerd / runc. Canonical wiki instance of concepts/micro-vm-as-pod at the K8s API level — distinct from Lambda's use of Firecracker at the serverless-function level. Fly frames it as "our system transmogrifies Docker containers into Firecracker microVMs".
  • sources/2024-02-15-flyio-globally-distributed-object-storage-with-tigris — contextual reference only: "we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy". Fly.io self-identifies its compute substrate as Firecracker-based while pitching the Tigris object-storage partnership; Firecracker itself is not central to Tigris's three- layer architecture (FoundationDB metadata + NVMe byte cache + QuiCK-style queue).
  • sources/2024-06-19-flyio-aws-without-access-keys — contextual substrate reference: Firecracker micro-VM is what Fly init runs inside and what Fly Machine instances are. The OIDC-federation post treats Firecracker as existing infrastructure and doesn't expose hypervisor-level detail; relevance is that the Macaroon-scoped-per-Machine identity model depends on Firecracker-grade isolation (a container-escape would let one Machine's Macaroon escape to another's workload).
  • sources/2024-08-15-flyio-were-cutting-l40s-prices-in-halfnegative-space datum on GPU-in-micro-VM. Fly.io tried to surface fractional-GPU slicing (NVIDIA MIG + vGPUs) inside Firecracker Machines via IOMMU PCI passthrough and abandoned the effort after "a whole quarter""a project so cursed that Thomas has forsworn ever programming again." Reasons-why are not disclosed, but the datum itself is load-bearing: IOMMU-passthrough-based fractional-GPU virtualisation on Firecracker is not a turnkey path even for a Firecracker-native platform. Fly.io pivoted to whole-GPU (A10 / L40S / A100 / H100) attachment per-Machine, which is the path currently productised.

  • sources/2025-02-14-flyio-we-were-wrong-about-gpusFly.io's 2025-02 GPU retrospective clarifies the Firecracker-vs-Cloud Hypervisor split. Non-GPU Fly Machines run on Firecracker; GPU Fly Machines run on Cloud Hypervisor (a "very similar Rust codebase" that supports PCI passthrough). Firecracker's minimal device model — the property that gives it fast-boot + small attack surface — is also why it can't host GPU passthrough; Fly had to pick a sibling VMM for the peripheral-accelerator path. This is the cleanest public disclosure of Firecracker's positioning relative to its PCI-passthrough-capable cousin. The 2025-02 post also elaborates on the 2024-08 MIG/vGPU failure datum: Fly "burned months trying (and ultimately failing) to get Nvidia's host drivers working to map virtualized GPUs into Intel Cloud Hypervisor. At one point, we hex-edited the closed-source drivers to trick them into thinking our hypervisor was QEMU" — Firecracker wasn't the integration target, but the underlying micro-VM posture is what put Fly off Nvidia's driver happy-path in the first place. Confirms fast-boot DX as the non-negotiable product requirement that forced the off-path choice ("we could not have offered our desired Developer Experience on the Nvidia happy-path").

  • sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenixFirecracker as the per-session cloud-IDE perimeter for Phoenix.new. Every Phoenix.new browser session is backed by a fresh Fly Machine (Firecracker micro-VM) that the user and a coding agent share as co-tenants with root access. The safety posture (concepts/agent-with-root-shell) depends on Firecracker's KVM isolation: the agent has full freedom inside the guest precisely because the guest is a disposable VM whose boundary is enforced by the hypervisor. Same ephemeral-VM-as-agent- sandbox property as patterns/disposable-vm-for-agentic-loop (the 2025-02-07 sketch) — now productised as the default Phoenix.new session shape (patterns/ephemeral-vm-as-cloud-ide). Firecracker's fast-boot property (concepts/fast-vm-boot-dx) is the precondition that makes per-session VMs viable as a product shape; slower hypervisors would force session reuse and reintroduce environment drift.

  • sources/2026-04-22-allthingsdistributed-invisible-engineering-behind-lambdas-networkdensity-unlock-requires-networking-retrofit arc on top of Firecracker. The 2024-11-15 PR/FAQ post established Firecracker-micro-VM as the replacement for single-tenant EC2 that enabled "thousands of micro VMs onto a single bare metal instance"; this 2026-04-22 post discloses what had to happen in networking to actually realize that 4,000-VM-per-worker density in production. Specifically: the Geneve-tunnel + Linux-RTNL + iptables + conntrack bottlenecks that didn't matter at launch density became architectural blockers at SnapStart density. Every eBPF / iptables / boot-time-pre-creation fix Lambda shipped was in service of letting Firecracker's packing guarantee cash out operationally. Canonical wiki disclosure that Firecracker's density is only realizable with a re-engineered network topology (not a property of the hypervisor alone). Also names SnapStart as the specific Firecracker-snapshot capability that forced the topology unification — each SnapStart clone needs an isolated, pre-created network namespace with tap + bridge + veth + tunnel, which the original on-demand path couldn't produce.

  • sources/2026-04-24-atlassian-rovo-dev-driven-developmentAtlassian Fireworks (2026-04-24) — canonical wiki instance of Firecracker as the µVM substrate for an internal AI-agent execution platform, composed with Kubernetes rather than standalone or behind a FaaS abstraction. Feature parity with the Lambda / Fly-Machines Firecracker fleets is explicit: "100ms warm starts, live migration between hosts, eBPF network policy enforcement, shared volumes, and snapshot filesystem restore, sidecar sandboxes." The post names the full control-plane Atlassian had to build on top of Firecracker — "scheduler, autoscaler, node agents, envoy ingress layers, raft persistence" — which is the same component list every Firecracker-based orchestrator ends up building (differing mostly in whether K8s or a custom surface is the engineer-facing API). First wiki disclosure of systems/atlassian-fireworks; first wiki canonicalisation of [[concepts/hardware-isolated- microvm-on-kubernetes]] as a named shape distinct from standalone Firecracker-as-µVM-monitor and from Firecracker-behind-FaaS. Also a notable cultural data-point: the platform was "built in four weeks, entirely by LLMs," making Firecracker an agent-authored rather than human-authored integration — with the AI-written e2e tests substituting for line-by-line code review.

Last updated · 542 distilled / 1,571 read