SYSTEM Cited by 12 sources
Fly Machines¶
Fly Machines are Fly.io's universal compute primitive: Firecracker micro-VMs orchestrated by flyd and created / managed through the Machines API. A "Machine" is Fly's answer to what other clouds call a VM, an ephemeral container, a Lambda function, or a Fargate task — one primitive that aims to cover all of those.
Role in FKS¶
Under FKS, the Pod is a Fly Machine. The Virtual-Kubelet provider receives a Pod-create request from K3s and issues a Machine-create call against the Machines API; flyd places the Machine; Firecracker boots it. Canonical example of concepts/micro-vm-as-pod.
Visibility from both sides of the abstraction:
kubectl run --image=<img> kuardcreates a Pod.fly machine list --app fks-default-<cluster-id>shows the same workload as a Machine:shared-cpu-1x:256MB,STATE started,REGION iad, IPv6 address on the org's WireGuard mesh.
The per-cluster Fly App name pattern is
fks-default-<cluster-id> — the default Kubernetes namespace's
Machines live under that App.
Seen in¶
- sources/2024-03-07-flyio-fly-kubernetes-does-more-now — named as
the Pod substrate under FKS ("Pods → Fly Machines VMs"); shown
directly via
fly machine listagainst akubectl run-created Pod. - sources/2024-06-19-flyio-aws-without-access-keys — Machine as
the identity unit for Fly.io's OIDC federation. Every Machine
has a platform-attested identity (
<org>:<app>:<machine>embedded in thesubclaim of OIDC tokens it can request from oidc.fly.io); init inside the Machine exposes a Unix-socket proxy at/.fly/api(self-described as "our answer to the EC2 Instant Metadata Service") and brokers OIDC tokens via the Machines-API endpoint/v1/tokens/oidc. Machine-scoped Macaroon tokens — issued by flyd at boot — are what make the/.fly/apirequests authenticatable; the guest code never sees them. Canonical workload-identity + machine-metadata-service instance on Fly.io. - sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half — GPU-attached Machines are how Fly.io productises GPU compute. A10 / L40S / A100 40G PCIe / A100 80G SXM / H100 attach to a Fly Machine via whole-GPU passthrough; the 2023 attempt to surface fractional GPUs via NVIDIA MIG + vGPU over IOMMU PCI passthrough was abandoned ("a project so cursed that Thomas has forsworn ever programming again"). Machines are the compute half of Fly.io's GPU + object-storage co-location pattern — combined with Tigris for model weights / datasets and Anycast for ingress.
- sources/2024-07-30-flyio-making-machines-move — Stateful
Machine migration: the 2024 rebuild of Fly's fleet-drain
capability for Machines with attached
Fly Volumes. The migration sequence
kill→clone→bootdecouples destination availability from data transfer. A new Machine boots on the target worker with adm-clonevolume; reads fall through to the source worker over iSCSI;kcopydrehydrates in background. By summer 2024 Fly's infra team can "pull 'drain this host' out of their toolbelt without much ceremony" — though automated rebalancing migrations are still gated. Canonical patterns/async-block-clone-for-stateful-migration instance. - sources/2024-09-24-flyio-ai-gpu-clusters-from-your-laptop-with-livebook
— Fly Machines as the elastic-executor substrate driven from
a Livebook notebook via
FLAME.
Flame.callprovisions per-cell pools of Machines (64 L40S Machines in the canonical BERT hyperparameter-tuning demo); the whole cluster terminates on notebook disconnect. Fly.io's stated platform contribution: "start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image" (concepts/seconds-scale-gpu-cluster-boot). Canonical patterns/notebook-driven-elastic-compute instance. - sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas — Fly Machines as the disposable-VM substrate for LLM agentic loops. Fly.io's 2025-02-07 post argues for (but defers the productisation of) using a Fly Machine as the target host for VSCode Remote-SSH when running agentic coding loops: the Machine is clean-slate, boots in under a second, and is discardable — the three requirements the post names for "a closed-loop agent-y configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can't screw you over in any way." Canonical wiki instance of patterns/disposable-vm-for-agentic-loop.
- sources/2025-02-12-flyio-the-exit-interview-jp-phillips —
platform-completeness framing from the engineer who
built the orchestrator. "The Fly Machines platform is more
or less finished, in the sense of being capable of supporting
the next iteration of our products. My original desire to
join Fly.io was to make Machines a product that would rid us
of HashiCorp Nomad, and I feel like that's been
accomplished." Also discloses sub-5-second P90 on
machine createglobally via flaps (Johannesburg and Hong Kong excepted), and introduces pilot as the OCI-compliant init successor with a formal flyd-driven API. Same interview floats per-Fly-Machine SQLite as an alternate design JP would consider (see patterns/per-instance-embedded-database). - sources/2025-02-14-flyio-we-were-wrong-about-gpus — Fly Machine GPU variant runs on Cloud Hypervisor, not Firecracker — and the product is being retrenched. Two load-bearing disclosures for the Fly Machines page:
- Hypervisor split by workload class. Non-GPU Machines run on Firecracker; GPU Machines run on Intel Cloud Hypervisor (PCI passthrough support). "A very similar Rust codebase that supports PCI passthrough." Fly rejected QEMU (would have worked with Nvidia drivers out of the box but violated the millisecond-boot DX requirement) and VMware (institutional fit). Fly.io also runs GPU Machines on dedicated GPU-only worker hosts — canonical wiki instance of patterns/dedicated-host-pool-for-hostile-peripheral, motivated by GPU-as- hostile-peripheral framing. Two external security audits (Atredis, Tetrel) cleared the productisation shape.
- GPU Machine product retrenchment. Canonical wiki instance of patterns/platform-retrenchment-without-customer-abandonment. Fly.io announces "if you're using Fly GPU Machines, don't freak out; we're not getting rid of them. But if you're waiting for us to do something bigger with them, a v2 of the product, you'll probably be waiting awhile." Diagnosis is demand-side: developers want LLMs, not GPUs — for transaction-shape developer workloads, OpenAI / Anthropic APIs win on tokens-per-second, and Fly's compute-storage-network locality thesis survives but doesn't drive demand. L40S customer segment persists ("a bunch of these!") as the remaining product-market fit. Also names the MIG-as-UUID-not-PCI limitation that keeps the thin-sliced-GPU market segment unreachable on a PCI-passthrough-based micro-VM hypervisor.
- sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-description
— GPU Fly Machines under proxy-managed autostop with a
disclosed cold-start number. Ollama + LLaVA-34b on the
a100-40gbpreset behind Flycast; Fly Proxy stops the Machine on idle and starts it on the next internal request from the PocketBase app tier. Discloses the three-stage GPU cold-start budget ("starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds") — canonical wiki datum for concepts/gpu-scale-to-zero-cold-start. Also names the explicita100-40gbpreset string used for the GPU Machine shape. Canonical instance of patterns/proxy-autostop-for-gpu-cost-control + patterns/flycast-scoped-internal-inference-endpoint on Fly Machines. - sources/2025-04-08-flyio-our-best-customers-are-now-robots —
Fly Machines as the RX-shaped robot-workload compute
primitive with two load-bearing disclosures for this page.
(1)
startvscreatelifecycle split. "There are two ways to start a Fly Machine: bycreatingit with a Docker container, or bystartingit after it's already beencreated, and laterstopped.Startis lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash thecreatebutton to boot apps up in Fly Machines. But the robots are getting a lot of value out of it." Canonical wiki instance of concepts/fly-machine-start-vs-create and patterns/start-fast-create-slow-machine-lifecycle. (2) Hypervisor shared with AWS Lambda. "Not coincidentally, our underlying hypervisor engine is the same as Lambda's. […] Like a Lambda invocation, a Fly Machine can start like it's spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM." First wiki disclosure that Fly Machines run on the same Firecracker substrate as Lambda (Fly.io's 2025-02-14 GPU retrospective disclosed the Cloud-Hypervisor-for-GPU split; this 2025-04-08 disclosure nails down that non-GPU Machines = Firecracker = Lambda's hypervisor). Also the first wiki framing of Fly Machines as the Lambda–EC2 hybrid ("fast start like Lambda, long-run like EC2"). Positions Fly Machines as the substrate for vibe-coding sessions — bursty-then-idle-for-hours workloads that fitstart/stopexactly. Also relevant to patterns/session-affinity-for-mcp-sse (Fly Machines are the stateful instances MCP SSE connections pin to via dynamic request routing) and patterns/tokenized-token-broker (Fly Machines are the hardware-isolated substrate for the "robot-free" tokenized-token broker). Canonical RX primitive on the compute-lifecycle axis. - sources/2025-05-19-flyio-launching-mcp-servers-on-flyio —
Fly Machines as the remote-MCP-server substrate for the
fly mcp launchflyctl subcommand. The default deployment shape of afly mcp launch-deployed MCP server is "each MCP server to a separate Machine" — one Machine = one MCP server, with all Machine-level knobs (auto-stop via concepts/scale-to-zero, Flycast private-network exposure, Volumes, region pinning, VM size) available. Alternative deployment shapes also listed: one-container-per-server and in-process-as-library. See systems/fly-mcp-launch for the full subcommand surface and patterns/remote-mcp-server-via-platform-launcher for the generalised pattern. -
sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — Fly Machines as the per-session cloud-IDE substrate for Phoenix.new. Every browser session is a fresh Fly Machine with a root shell shared between the developer and a Phoenix-tuned coding agent, a full Chrome the agent drives,
*.phx.runpreview URLs from any bound port, and theghCLI pre-installed. Canonical productised instance of patterns/ephemeral-vm-as-cloud-ide — the sibling shape to the patterns/disposable-vm-for-agentic-loop sketch from 2025-02-07 (same substrate, same goal, now shipped as a product). Adds "per-session cloud-IDE for coding agents" to the set of first-class Fly Machine workload shapes alongside stateful apps, FKS Pods, GPU inference, MCP-server hosting, disposable agent VMs, and OIDC-federated workload-identity units. The agent-with-root-shell posture (concepts/agent-with-root-shell) explicitly treats the Fly Machine boundary (Firecracker + KVM isolation) as the blast- radius limit; the agent has maximum freedom inside the VM because the VM is ephemeral and disposable. -
Post does not disclose the mapping from K8s
resources: requests / limitsto Fly Machine shapes (the example Machine isshared-cpu-1x:256MB, the default sizing floor). - No published Pod-create latency numbers for FKS — Firecracker is fast, but the combined VK-provider + Machines-API + flyd-placement + Firecracker-boot path is not quantified.