Skip to content

CONCEPT Cited by 1 source

Inside-out orchestration

Definition

A VM-platform architectural pattern in which the majority of orchestration and management services run inside the VM itself (in the VM's root namespace) rather than on the physical host. The host is a relatively dumb VM launcher; the VM's root namespace contains the storage stack, service manager, log pipeline, network/port-forwarding proxy, and platform-API endpoint. User code runs in an inner container (concepts/inner-container-vm) slid between the user and the kernel, so the platform services have a separate trust / reboot domain from the user workload.

Canonical wiki statement

Fly.io Sprites, 2026-01-14:

"In the cloud hosting industry, user applications are managed by two separate, yet equally important components: the host, which orchestrates workloads, and the guest, which runs them. Sprites flip that on its head: the most important orchestration and management work happens inside the VM. Here's the trick: user code running on a Sprite isn't running in the root namespace. We've slid a container between you and the kernel. You see an inner environment, managed by a fleet of services running in the root namespace of the VM."

"With Sprites, we're pushing this idea as far as we can. The root environment hosts the majority of our orchestration code. When you talk to the global API, chances are you're talking directly to your own VM."

(Source: [[sources/2026-01-14-flyio-the-design- implementation-of-sprites]])

Services named as "inside" (VM root namespace)

Fly.io enumerates:

  • Storage stack — checkpoint/restore, persistence to object storage, JuiceFS-derived data/ metadata split.
  • Service manager — registers user code that should restart when a Sprite bounces.
  • Logs — log collection / shipping pipeline.
  • Port-forwarding / ingress proxy"if you bind a socket to *:8080, we'll make it available outside the Sprite — yep, that's in the root namespace too."
  • Global-API handler"When you talk to the global API, chances are you're talking directly to your own VM."

Contrast with host-centric orchestration (Fly Machines)

Fly Machines' orchestrator (flyd) runs on the host, implementing VM lifecycle as an FSM and coordinating across a host-wide database. Changes to flyd involve live host-process restarts, cross-host coordination, and whole-fleet migration risk.

Ptacek:

"Platform developers at Fly.io know how much easier it can be to hack on init (inside the container) than things like flyd, the Fly Machines orchestrator that runs on the host. Changes to Sprites don't restart host components or muck with global state. The blast radius is just new VMs that pick up the change. We sleep on how much platform work doesn't get done not because the code is hard to write, but because it's so time-consuming to ensure benign-looking changes don't throw the whole fleet into metastable failure. We had that in mind when we did Sprites."

Properties the inversion buys

1. Per-VM blast radius for platform changes

A change to a Sprite's root-namespace services is picked up by new Sprites that boot after the rollout. Existing Sprites run the prior version until they bounce. No fleet-wide restart; no metastable-failure risk. See patterns/blast-radius-in-vm-not-host.

2. Bounce user code without rebooting the VM

"The inner container allows us to bounce a Sprite without rebooting the whole VM, even on checkpoint restores."

The outer root-namespace services stay up; only the inner container restarts. Checkpoint / restore and user-code crash- recovery don't cost a kernel reboot.

3. API-to-VM is a short path

"When you talk to the global API, chances are you're talking directly to your own VM."

The public platform API for a given Sprite is routed to that Sprite's own root-namespace API handler (via Anycast + Corrosion). Bypasses a central control plane for most per-Sprite operations.

4. Platform team velocity

Ptacek's meta-argument: the implicit cost of platform engineering is not code complexity — it's the care required to deploy benign changes without destabilising a fleet. Inside-out orchestration collapses that deploy-care into VM-launch lifecycle: deploy = new VMs.

Caveats

  • Trust-boundary is the inner-container boundary, not the VM boundary. Escape from the inner container into the root namespace reaches the orchestration code. This is a narrower trust boundary than a bare-Fly-Machine (where the Machine itself is the trust boundary); the post sketches the mechanism ("We've slid a container between you and the kernel") but doesn't discuss the threat model.
  • More work per VM. Every Sprite pays the cost of running the full platform-services set in its root namespace. On dense worker physicals with many idle Sprites, the per-VM overhead compounds.
  • Version skew. Sprites created on different dates run different versions of the root-namespace orchestration code. Compatibility between those and the global API / cross- Sprite services has to hold. The post doesn't discuss the compat regime.
  • Ptacek hints this would have helped Fly Machines too: "I wish we'd done Fly Machines this way to begin with. I'm not sure there's a downside." The wish is aspirational — Fly Machines haven't been refactored to the inside-out shape.

The post frames inside-out as novel. Architecturally adjacent shapes:

  • Firecracker's "jailer" + microservices-in-guest at AWS Lambda (guest-side runtime init handles a lot of per- invocation work).
  • Kata Containers' agent in the guest.
  • VSock / guest-agent daemons generally.

The Sprites novelty is in how much orchestration logic (storage, service management, logs, ingress, API) is pushed into the guest root namespace — most comparable systems use the guest agent as a narrow RPC channel, not as the primary orchestration substrate.

Seen in

  • [[sources/2026-01-14-flyio-the-design-implementation-of- sprites]] — canonical wiki statement.
Last updated · 319 distilled / 1,201 read