PATTERN Cited by 1 source

Start-fast / create-slow Machine lifecycle¶

Shape¶

Expose two distinct Machine-lifecycle primitives through the compute API:

create — instantiate a new Machine from an image. Slow (image pull, filesystem layout, orchestrator registration). Billed.
start — re-start a Machine that has been stopped but exists. Fast (image already laid out, Machine already registered). Billed while running; not billed while stopped.

The API-level distinction is preserved — there is not a single "run my Machine" button. Clients are expected to create once, start/stop many times. The pattern trades a bit of API surface complexity for an order-of-magnitude faster resume-from-idle path.

Canonical wiki statement¶

Fly.io, 2025-04-08:

There are two ways to start a Fly Machine: by creating it with a Docker container, or by starting it after it's already been created, and later stopped. Start is lightning fast; substantially faster than booting up even a non-virtualized K8s Pod. This is too subtle a distinction for humans, who (reasonably!) just mash the create button to boot apps up in Fly Machines. But the robots are getting a lot of value out of it.

(Source: sources/2025-04-08-flyio-our-best-customers-are-now-robots)

Why expose two paths¶

The obvious simplification is one button — "boot this Machine" — that hides whether the Machine needs to be created or can be resumed. Fly.io deliberately exposes both. Why:

Latency asymmetry. start is double-digit ms; create is seconds. Collapsing them forces every call to pay the worst-case latency. Robots running HTTP-shape wake-on-request workflows can't afford that.
Cost asymmetry. stopped Machines aren't billed; created-and-never-used Machines are. Clients that choose the lifecycle control when they pay.
State asymmetry. stop preserves the filesystem on the worker's NVMe; create starts from the base image. LLM clients doing stateful incremental build need the preservation; they can't redo the build every cycle.

Consequences at the orchestrator¶

For the orchestrator (in Fly.io's case flyd) the two-path split means:

create allocates and pins. Decides which worker the Machine goes on. Sets up the root filesystem.
stop keeps the allocation. Machine stays on the same worker; filesystem stays on the worker's NVMe. Freed-up resources: CPU / RAM (not disk).
start re-runs Firecracker against the prepared disk on the same worker. No scheduling decision. No network set-up.

This is the critical design move: start does not re-schedule. It doesn't pick a worker. It doesn't allocate networking. It doesn't pull an image. Fly's claim that start is "substantially faster than booting up even a non-virtualized K8s Pod" is because K8s Pod boot re-schedules (admission, node selection) every time.

Lambda / EC2 positional framing¶

Fly.io positions this shape as a hybrid of Lambda and EC2:

Like a Lambda invocation, a Fly Machine can start like it's spring-loaded, in double-digit millis. But unlike Lambda, it can stick around as long as you want it to: you can run a server, or a 36-hour batch job, just as easily in a Fly Machine as in an EC2 VM.

(Source: sources/2025-04-08-flyio-our-best-customers-are-now-robots)

The shape borrows from both sides:

Lambda side: start latency (shared Firecracker hypervisor), scale-to-zero while stopped, billed only while running.
EC2 side: Machine persists across start/stop; filesystem survives; long-running workloads allowed.

Why this is an RX primitive¶

The two-path split is the compute-side half of the RX argument. Vibe-coding workloads (concepts/vibe-coding) are bursty-then-idle:

Active minute → start → pay while running.
Idle hours → stop → don't pay, don't lose state.
Next active minute → start again, fast → pay while running.

No other major cloud-compute primitive exposes this exact cadence. Lambda is per-invocation; EC2 stop/start is measurable minutes; containers-in-K8s need scheduling rethink on every boot. Fly's API contract is the shape this workload wants.

Implementation prerequisites¶

For the pattern to work on another platform:

A Machine-level stop that preserves the filesystem on the worker. Not just a "container exit" — the disk has to stay.
A Machine-level start that reuses the filesystem without re-scheduling. The orchestrator has to keep the Machine pinned to a worker across stop/start.
Billing granularity at start/stop. Otherwise tenants pay for idle time and the pattern degenerates.
Fast-enough boot to fit in an HTTP request. Firecracker or equivalent. Otherwise the start path doesn't wake on demand.

Open questions / limits¶

Worker eviction. A stopped Machine pinned to one worker is a scheduling-stiffness cost the orchestrator pays. If the worker fails or is drained, the Machine has to be migrated (or re-created elsewhere, losing state). Fly.io's 2024 migration rebuild (patterns/async-block-clone-for-stateful-migration) addresses the migration case.
Long-stopped resource cost. The filesystem keeps consuming NVMe even when not billed. Fly's billing model accounts for this implicitly; another platform would have to decide how to price long-stopped disks.
Re-creation semantics. If the Machine's base image changes, does start pick up the new image? No — start is bit-for-bit re-boot of the laid-out disk. The tenant has to create a new Machine to pick up an image change.
Failure modes of robots starting many Machines. A compromised / runaway LLM client could start thousands of stopped Machines in a loop. Quotas and rate limits have to catch this at the API tier.

Known uses¶

Fly.io (2025-04-08 and earlier) — canonical wiki instance. create/start/stop primitives on the Fly Machines API; the subject of the 2025-04-08 "robots" post's compute-side claim.

systems/fly-machines — the system the pattern lives inside.
systems/flyd — the orchestrator that owns the per-Machine pinning that makes start fast.
systems/aws-lambda — the comparator on the fast-boot axis; doesn't expose the stop/start-with-state half.
concepts/fly-machine-start-vs-create — the wiki concept this pattern instantiates.
concepts/cold-start — start is the fast-path cold start; create is the slow-path one.
concepts/scale-to-zero — stop is the scale-to-zero leg.
concepts/fast-vm-boot-dx — the boot-latency property that makes the pattern viable.
concepts/robot-experience-rx — the product-design axis this pattern is an RX data point on.
patterns/disposable-vm-for-agentic-loop — adjacent pattern. Disposable-VM discards the VM on teardown; this pattern keeps it around. They compose: disposable-VM for untrusted one-shot loops; start-fast/create-slow for trusted long-lived vibe-coding sessions.
companies/flyio — canonical wiki source.