Skip to content

SYSTEM Cited by 12 sources

Fly Machines

Fly Machines are Fly.io's universal compute primitive: Firecracker micro-VMs orchestrated by flyd and created / managed through the Machines API. A "Machine" is Fly's answer to what other clouds call a VM, an ephemeral container, a Lambda function, or a Fargate task — one primitive that aims to cover all of those.

Role in FKS

Under FKS, the Pod is a Fly Machine. The Virtual-Kubelet provider receives a Pod-create request from K3s and issues a Machine-create call against the Machines API; flyd places the Machine; Firecracker boots it. Canonical example of concepts/micro-vm-as-pod.

Visibility from both sides of the abstraction:

  • kubectl run --image=<img> kuard creates a Pod.
  • fly machine list --app fks-default-<cluster-id> shows the same workload as a Machine: shared-cpu-1x:256MB, STATE started, REGION iad, IPv6 address on the org's WireGuard mesh.

The per-cluster Fly App name pattern is fks-default-<cluster-id> — the default Kubernetes namespace's Machines live under that App.

Seen in

  • sources/2024-03-07-flyio-fly-kubernetes-does-more-now — named as the Pod substrate under FKS ("Pods → Fly Machines VMs"); shown directly via fly machine list against a kubectl run-created Pod.
  • sources/2024-06-19-flyio-aws-without-access-keys — Machine as the identity unit for Fly.io's OIDC federation. Every Machine has a platform-attested identity (<org>:<app>:<machine> embedded in the sub claim of OIDC tokens it can request from oidc.fly.io); init inside the Machine exposes a Unix-socket proxy at /.fly/api (self-described as "our answer to the EC2 Instant Metadata Service") and brokers OIDC tokens via the Machines-API endpoint /v1/tokens/oidc. Machine-scoped Macaroon tokens — issued by flyd at boot — are what make the /.fly/api requests authenticatable; the guest code never sees them. Canonical workload-identity + machine-metadata-service instance on Fly.io.
  • sources/2024-08-15-flyio-were-cutting-l40s-prices-in-halfGPU-attached Machines are how Fly.io productises GPU compute. A10 / L40S / A100 40G PCIe / A100 80G SXM / H100 attach to a Fly Machine via whole-GPU passthrough; the 2023 attempt to surface fractional GPUs via NVIDIA MIG + vGPU over IOMMU PCI passthrough was abandoned ("a project so cursed that Thomas has forsworn ever programming again"). Machines are the compute half of Fly.io's GPU + object-storage co-location pattern — combined with Tigris for model weights / datasets and Anycast for ingress.
  • sources/2024-07-30-flyio-making-machines-moveStateful Machine migration: the 2024 rebuild of Fly's fleet-drain capability for Machines with attached Fly Volumes. The migration sequence killcloneboot decouples destination availability from data transfer. A new Machine boots on the target worker with a dm-clone volume; reads fall through to the source worker over iSCSI; kcopyd rehydrates in background. By summer 2024 Fly's infra team can "pull 'drain this host' out of their toolbelt without much ceremony" — though automated rebalancing migrations are still gated. Canonical patterns/async-block-clone-for-stateful-migration instance.
  • sources/2024-09-24-flyio-ai-gpu-clusters-from-your-laptop-with-livebook — Fly Machines as the elastic-executor substrate driven from a Livebook notebook via FLAME. Flame.call provisions per-cell pools of Machines (64 L40S Machines in the canonical BERT hyperparameter-tuning demo); the whole cluster terminates on notebook disconnect. Fly.io's stated platform contribution: "start a cluster of GPUs in seconds rather than minutes, and all it requires is a Docker image" (concepts/seconds-scale-gpu-cluster-boot). Canonical patterns/notebook-driven-elastic-compute instance.
  • sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas — Fly Machines as the disposable-VM substrate for LLM agentic loops. Fly.io's 2025-02-07 post argues for (but defers the productisation of) using a Fly Machine as the target host for VSCode Remote-SSH when running agentic coding loops: the Machine is clean-slate, boots in under a second, and is discardable — the three requirements the post names for "a closed-loop agent-y configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can't screw you over in any way." Canonical wiki instance of patterns/disposable-vm-for-agentic-loop.
  • sources/2025-02-12-flyio-the-exit-interview-jp-phillipsplatform-completeness framing from the engineer who built the orchestrator. "The Fly Machines platform is more or less finished, in the sense of being capable of supporting the next iteration of our products. My original desire to join Fly.io was to make Machines a product that would rid us of HashiCorp Nomad, and I feel like that's been accomplished." Also discloses sub-5-second P90 on machine create globally via flaps (Johannesburg and Hong Kong excepted), and introduces pilot as the OCI-compliant init successor with a formal flyd-driven API. Same interview floats per-Fly-Machine SQLite as an alternate design JP would consider (see patterns/per-instance-embedded-database).
  • sources/2025-02-14-flyio-we-were-wrong-about-gpusFly Machine GPU variant runs on Cloud Hypervisor, not Firecracker — and the product is being retrenched. Two load-bearing disclosures for the Fly Machines page:
  • Hypervisor split by workload class. Non-GPU Machines run on Firecracker; GPU Machines run on Intel Cloud Hypervisor (PCI passthrough support). "A very similar Rust codebase that supports PCI passthrough." Fly rejected QEMU (would have worked with Nvidia drivers out of the box but violated the millisecond-boot DX requirement) and VMware (institutional fit). Fly.io also runs GPU Machines on dedicated GPU-only worker hosts — canonical wiki instance of patterns/dedicated-host-pool-for-hostile-peripheral, motivated by GPU-as- hostile-peripheral framing. Two external security audits (Atredis, Tetrel) cleared the productisation shape.
  • GPU Machine product retrenchment. Canonical wiki instance of patterns/platform-retrenchment-without-customer-abandonment. Fly.io announces "if you're using Fly GPU Machines, don't freak out; we're not getting rid of them. But if you're waiting for us to do something bigger with them, a v2 of the product, you'll probably be waiting awhile." Diagnosis is demand-side: developers want LLMs, not GPUs — for transaction-shape developer workloads, OpenAI / Anthropic APIs win on tokens-per-second, and Fly's compute-storage-network locality thesis survives but doesn't drive demand. L40S customer segment persists ("a bunch of these!") as the remaining product-market fit. Also names the MIG-as-UUID-not-PCI limitation that keeps the thin-sliced-GPU market segment unreachable on a PCI-passthrough-based micro-VM hypervisor.
  • sources/2024-05-09-flyio-picture-this-open-source-ai-for-image-descriptionGPU Fly Machines under proxy-managed autostop with a disclosed cold-start number. Ollama + LLaVA-34b on the a100-40gb preset behind Flycast; Fly Proxy stops the Machine on idle and starts it on the next internal request from the PocketBase app tier. Discloses the three-stage GPU cold-start budget ("starting it up took another handful of seconds, followed by several tens of seconds to load the model into GPU RAM. The total time from cold start to completed description was about 45 seconds") — canonical wiki datum for concepts/gpu-scale-to-zero-cold-start. Also names the explicit a100-40gb preset string used for the GPU Machine shape. Canonical instance of patterns/proxy-autostop-for-gpu-cost-control + patterns/flycast-scoped-internal-inference-endpoint on Fly Machines.
  • sources/2025-04-08-flyio-our-best-customers-are-now-robotsFly Machines as the RX-shaped robot-workload compute primitive with two load-bearing disclosures for this page. (1) start vs create lifecycle split. "There are two ways to start a Fly Machine: by creating it with a Docker container, or by starting it after it's already been created, and later stopped. Start is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the create button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it." Canonical wiki instance of concepts/fly-machine-start-vs-create and patterns/start-fast-create-slow-machine-lifecycle. (2) Hypervisor shared with AWS Lambda. "Not coincidentally, our underlying hypervisor engine is the same as Lambda's. […] Like a Lambda invocation, a Fly Machine can start like it's spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM." First wiki disclosure that Fly Machines run on the same Firecracker substrate as Lambda (Fly.io's 2025-02-14 GPU retrospective disclosed the Cloud-Hypervisor-for-GPU split; this 2025-04-08 disclosure nails down that non-GPU Machines = Firecracker = Lambda's hypervisor). Also the first wiki framing of Fly Machines as the Lambda–EC2 hybrid ("fast start like Lambda, long-run like EC2"). Positions Fly Machines as the substrate for vibe-coding sessions — bursty-then-idle-for-hours workloads that fit start/stop exactly. Also relevant to patterns/session-affinity-for-mcp-sse (Fly Machines are the stateful instances MCP SSE connections pin to via dynamic request routing) and patterns/tokenized-token-broker (Fly Machines are the hardware-isolated substrate for the "robot-free" tokenized-token broker). Canonical RX primitive on the compute-lifecycle axis.
  • sources/2025-05-19-flyio-launching-mcp-servers-on-flyioFly Machines as the remote-MCP-server substrate for the fly mcp launch flyctl subcommand. The default deployment shape of a fly mcp launch-deployed MCP server is "each MCP server to a separate Machine" — one Machine = one MCP server, with all Machine-level knobs (auto-stop via concepts/scale-to-zero, Flycast private-network exposure, Volumes, region pinning, VM size) available. Alternative deployment shapes also listed: one-container-per-server and in-process-as-library. See systems/fly-mcp-launch for the full subcommand surface and patterns/remote-mcp-server-via-platform-launcher for the generalised pattern.
  • sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenixFly Machines as the per-session cloud-IDE substrate for Phoenix.new. Every browser session is a fresh Fly Machine with a root shell shared between the developer and a Phoenix-tuned coding agent, a full Chrome the agent drives, *.phx.run preview URLs from any bound port, and the gh CLI pre-installed. Canonical productised instance of patterns/ephemeral-vm-as-cloud-ide — the sibling shape to the patterns/disposable-vm-for-agentic-loop sketch from 2025-02-07 (same substrate, same goal, now shipped as a product). Adds "per-session cloud-IDE for coding agents" to the set of first-class Fly Machine workload shapes alongside stateful apps, FKS Pods, GPU inference, MCP-server hosting, disposable agent VMs, and OIDC-federated workload-identity units. The agent-with-root-shell posture (concepts/agent-with-root-shell) explicitly treats the Fly Machine boundary (Firecracker + KVM isolation) as the blast- radius limit; the agent has maximum freedom inside the VM because the VM is ephemeral and disposable.

  • Post does not disclose the mapping from K8s resources: requests / limits to Fly Machine shapes (the example Machine is shared-cpu-1x:256MB, the default sizing floor).

  • No published Pod-create latency numbers for FKS — Firecracker is fast, but the combined VK-provider + Machines-API + flyd-placement + Firecracker-boot path is not quantified.
Last updated · 200 distilled / 1,178 read