Building Agents that Don't Break Themselves¶
Summary¶
A practical guide from Fly.io demonstrating how to architect AI agent systems so the agent loop (brain) lives on stable infrastructure while all risky command execution (hands) is dispatched to ephemeral, disposable Fly Sprites. Two production implementations — SpriteDoc (multi-user troubleshooting agent) and Hermes Agent (open-source personal agent from Nous Research) — illustrate opposite lifecycle policies on the same building block: per-session ephemeral and per-task persistent, respectively. The post also shows ephemeral credential injection (token never at rest in the sandbox) and copy-on-write checkpoint/restore as an undo mechanism for destructive commands.
Key Takeaways¶
-
Separate where the agent lives from where it executes code. The agent loop is long-lived and benefits from persistent memory/skills; command execution should happen in an isolated, throwaway environment that can't damage the agent itself.
-
Per-session ephemeral sandbox (SpriteDoc pattern). Each user session spins up a fresh Sprite on first filesystem access, uploads project source trees, installs required CLIs. The Sprite is torn down when the session ends — no persistent footprint. Sprites' idle cost is near-zero thanks to warm→cold status transitions.
-
Per-task persistent sandbox (Hermes pattern). The opposite lifecycle on the same substrate: one Sprite per task, resumed across sessions so installed tooling persists. Same isolation guarantee, different disposal policy — one config decision.
-
Ephemeral credential injection: user tokens are injected into the sandbox environment only for the duration of a single command, then removed. The credential is never written to disk and never stored at rest in the Sprite. If the Sprite is later inspected or compromised, no token exists to steal.
-
Approval prompts become unnecessary when the sandbox is the security boundary. Hermes skips "are you sure?" confirmations on destructive commands because the sandbox physically cannot affect the host.
-
Double-sandbox for defense-in-depth. Even an agent already running inside a sandbox should dispatch untrusted strings to a different sandbox. Kyle demonstrated this with Hermes running in one Sprite and executing commands in a second Sprite with a different identity and boot.
-
Checkpoint/restore as undo button. Copy-on-write checkpointing before risky steps makes "restore and retry" the worst case instead of "restore from backup." Demonstrated:
rm -rfwiped app and git; 9-second checkpoint restore recovered both to the byte. -
Cheap checkpointing as reflex. Because checkpoints are copy-on-write, the cost is negligible. Agents that can roll back can run unattended — the failure mode is "retry" not "rebuild."
Operational Numbers¶
- Sprite spin-up: fast enough to be "all but unnoticeable" (sub-2s based on prior Fly.io disclosures)
- Checkpoint restore: ~9 seconds demonstrated
- Idle cost: near-zero (warm→cold status drops stop metering)
Caveats¶
- Tier-3 source; architecturally interesting but vendor-specific to Fly.io's Sprites product
- No performance benchmarks or latency numbers beyond qualitative claims
- Multi-user isolation relies entirely on Sprites' VM-level isolation — no discussion of escape risks
Source¶
- Original: https://fly.io/blog/building-agents-that-dont-break-themselves/
- Raw markdown:
raw/flyio/2026-06-08-building-agents-that-dont-break-themselves-113f70f2.md
Related¶
- concepts/agent-brain-hands-decoupling — the foundational architectural split this post exemplifies
- patterns/credentialed-proxy-sandbox — related pattern; this post's ephemeral injection is a simpler variant
- systems/fly-sprites — the execution substrate used in both examples
- concepts/durable-vs-ephemeral-sandbox — this post shows both lifecycle ends on the same substrate
- patterns/checkpoint-before-risky-step — the undo mechanism demonstrated