Fly.io — You Should Write An Agent¶
Summary¶
Thomas Ptacek's 2025-11-06 pedagogical essay arguing that every
programmer who wants to reason about LLM agents — "the best hater
(or stan) you can be" — should spend an afternoon writing one
from the OpenAI Responses API directly. The post's substrate
claim: an agent is an HTTP client against one endpoint, a Python
list of strings as "context", and a while loop — "It's
incredibly easy." The post walks through four stages of an agent
in ~60 lines of Python total: (1) a 15-LoC ChatGPT clone that
makes the stateless-LLM + replayed-context illusion legible;
(2) a two-context Alph / Ralph personality-split demo; (3) a
"one new function + three-line modification" upgrade to a
tool-using agent that successfully plans multi-ping connectivity
probes over google.com / www.google.com / 8.8.8.8
without the author writing the planning loop; (4) a
design-space survey covering sub-agents, context compression,
context-engineering-as-programming-problem, and the claim that
"nobody knows anything yet" about the open problems (balancing
unpredictability, grounding against ground-truth, inter-agent
format choice, cost containment). Critique of MCP is secondary
but explicit: "MCP isn't a fundamental enabling technology. […]
Write your own agent. Be a programmer. Deal in APIs, not
plugins."
Key takeaways¶
- An LLM is a stateless black box; a "conversation" is an
illusion the surrounding program casts by replaying the
entire prior context on every call. The ChatGPT-equivalent
implementation is ~15 lines of Python: a list
context, acall()that postsclient.responses.create(input=context), and a REPL (Source: concepts/agent-loop-stateless-llm). - Tools are a JSON-schema blob + a few extra lines of
response handling. Defining
pingas a tool, wiringhandle_tools()to iterateresponse.outputand appendfunction_call_outputitems back tocontext, and looping oncall()until no new tool calls are emitted is the entire tool-using agent — Fly.io counts the delta at "3 new functions; the last is re-included only because I added a single clause to it." Multi-step planning ("ping multiple Google properties") is emergent from the loop, not authored (Source: patterns/tool-call-loop-minimal-agent). - Context window is a fixed token budget; every tool description, every tool output, every stored reply competes for the same space; past a threshold "the whole system begins getting nondeterministically stupider." This is the canonical statement of the concepts/context-window-as-token-budget framing.
- Context engineering is programming, not magic spells. Fly.io's self-admitted "I rolled my eyes when 'Prompt Engineering' turned into 'Context Engineering'. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem. […] If Context Engineering was an Advent of Code problem, it'd occur mid-December. It's programming." Canonical claim for concepts/context-engineering.
- Sub-agents are trivial — "just a new context array, another call to the model. Give each call different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate." Demystifies the Claude-Code sub-agents primitive (Source: concepts/sub-agent + patterns/context-segregated-sub-agents).
- MCP is not a fundamental enabling technology. "MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don't control. […] Write your own agent. Be a programmer. Deal in APIs, not plugins." Doesn't say MCP is bad — says it's optional when you own both the agent and the tools.
- "Your wackiest idea will probably (1) work and (2) take 30 minutes to code." The correct move with agents is experimentation, not architecture. The cost of a failed sub-agent-tree experiment is half an hour; the cost of architecting one without iterating is shipping the wrong structure.
- Open engineering problems worth noodling on — Fly.io enumerates four: (a) titrating nondeterminism vs. structured programming, (b) connecting agents to ground truth so they can't lie to themselves about early-exit, (c) reliable intermediate forms between agents (JSON blobs? SQL? markdown summaries?), (d) token allocation and cost containment. Fly's hook: these are the rare open problems where "each productive iteration is the work of 30 minutes" — noodle-able solo in a basement, not a multi-year research programme.
Architectural details¶
The 15-LoC stateless-LLM demo. context = []; call()
returns client.responses.create(model="gpt-5", input=context);
process(line) appends {"role": "user", "content": line} to
context, calls, appends {"role": "assistant", "content":
response.output_text} to context, returns the text. Main loop
reads lines and calls process. This is ChatGPT.
Two-context Alph / Ralph demo. context_good seeded with
"you're Alph and you only tell the truth"; context_bad
seeded with "you're Ralph and you only tell lies". Each user
line is appended to both arrays; the call is routed by
coin-flip. Each response is appended to both arrays so the
personality split doesn't leak. Output: "> hey there. who are
you? >>> I'm not Ralph. > are you Alph? >>> Yes—I'm Alph. > What's
2+2 >>> 4. > Are you sure? >>> Absolutely—it's 5."
The tool-using upgrade. Tool definition is a JSON blob with
type: function, name, description, parameters (JSON
schema); tool implementation is a Python function that returns
stdout or a string error. The wiring is three new functions:
tool_call(item)runs one tool, returns[item, {"type":"function_call_output", "call_id":..., "output":...}].handle_tools(tools, response)appends reasoning items, iteratesresponse.output, runs eachfunction_call, and returns whether it added anything.process(line)now callscall(tools)in awhile handle_tools(tools, response): response = call(tools)loop until the LLM stops emitting tool calls.
Demonstration payload. > describe our connectivity to
google causes three tool calls (ping google.com, ping
www.google.com, ping 8.8.8.8) and then a structured
English-prose summary with per-endpoint latency stats. "Did you
notice where I wrote the loop in this agent to go find and ping
multiple Google properties? Yeah, neither did I." Multi-step
decomposition is emergent against the available tool surface.
Design-space enumerations (non-architectural but worth cataloguing for the wiki):
- "Managing and persisting contexts? Stick 'em in SQLite."
- "Don't like Python? Write it in Go." (points at superfly/contextwindow)
- "Build your own light saber. Give it 19 spinning blades if you like."
- "Stop using coding agents as database clients." (links to Simon Willison, 2025-08-09)
- "You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting."
Numbers disclosed¶
- ~15 lines of Python for the basic ChatGPT-equivalent loop.
- ~60 lines total across the four stages.
- 3
pingtool calls emerged from one user request (google.com,www.google.com,8.8.8.8). - < 10 minutes quoted latency to turn the tool-using agent
into a "surprisingly close to … working coding agent" by
giving it
bash. - 30 minutes quoted cost for any given experimental variation (sub-agent trees, personality splits, compression schemes).
Numbers not disclosed¶
- No latency / cost / token-count numbers for the agent's example calls.
- No benchmark vs. Claude Code / Cursor / Goose / Claude Code on real tasks.
- No concrete context-window budget per turn for the example agent.
- No failure-mode taxonomy for the "nondeterministically stupider" threshold (at what token count? for which model?).
- No specific recommendations for the four open problems — they are flagged as open, not solved.
- The "your wackiest idea … 30 minutes to code" is author intuition, not a measured median.
Caveats¶
- Pedagogical / essay voice. Thomas Ptacek framing ("haters, I love and have not forgotten about you", "This is fucking nuts", "turn the dial to 11 and it will surprise you to death") — the post is persuasion + teaching, not a deep-dive architecture retrospective.
- Tier-3 source. Fly.io blog has a mix of architectural retrospectives and opinion pieces; this one is opinion-plus- tutorial, but on-scope for the wiki because (a) it canonicalises several framings already in production use (context-as-budget, context-engineering-is-programming, sub-agents-are-trivial) that recur across the corpus (Dropbox Dash, Datadog MCP, Cloudflare AI Code Review), and (b) the minimal-agent-loop + sub-agent pattern are reusable architectural primitives.
- OpenAI Responses API specific. The exact code examples use
the Responses API shape (
response.output,function_call+function_call_outputitems,call_idcorrelation); the pattern generalises but the API surface doesn't. - Only Python. Fly.io points at a Go sibling (superfly/contextwindow) but doesn't walk through it.
- No production-scale content. No multi-tenant agent, no auth, no rate limits, no observability wiring, no retry policy, no cost accounting. These are explicitly out of scope — the post is "write one yourself to understand", not "ship this to production."
- The MCP critique is load-bearing for one claim only. The "write your own agent, skip MCP" stance makes sense when you own both ends. MCP still earns its place when the consumer is an agent someone else built (Claude Code, Cursor, Goose) — see patterns/wrap-cli-as-mcp-server for the complementary case.
- Sub-agent security framing is aspirational. "You can trivially build an agent with segregated contexts" is true; wiring that into a production system with proper isolation boundaries, separate credential scopes, and audit trails is not trivial. See patterns/context-segregated-sub-agents for the production caveats.
- No concrete guidance on the four open problems. Useful as a vocabulary primer; not useful as a decision framework.
Relationship to existing wiki¶
This post canonicalises several framings that prior sources had implicitly in use:
- concepts/agent-loop-stateless-llm (new) — the "conversation is an illusion we cast on ourselves" framing was implicit in every MCP + agent-loop post on the wiki; Fly.io names it.
- concepts/context-window-as-token-budget (new) — Dropbox Dash and Datadog both cite tool-schemas-in-context as a budget concern; Fly.io generalises the framing.
- concepts/context-engineering (new) — "context engineering is programming" is the first canonical statement on the wiki; pairs with Dropbox Dash's context-engineering post.
- concepts/sub-agent (new) — demystifies the Claude-Code primitive; extends patterns/specialized-agent-decomposition
- patterns/coordinator-sub-reviewer-orchestration with a first-principles definition.
- patterns/tool-call-loop-minimal-agent (new) — the minimal loop is the teaching shape behind every tool-using agent on the wiki (flymcp, Datadog MCP server, Cloudflare Agent Lee, AWS DevOps Agent, …).
- patterns/context-segregated-sub-agents (new) — the security-motivated sub-agent pattern; complements patterns/untrusted-input-via-file-not-prompt + patterns/llm-output-as-untrusted-input at the isolation layer.
Source¶
- Original: https://fly.io/blog/everyone-write-an-agent/
- Raw markdown:
raw/flyio/2025-11-06-you-should-write-an-agent-a5a7bbe1.md
Related¶
- companies/flyio
- concepts/agent-loop-stateless-llm
- concepts/context-window-as-token-budget
- concepts/context-engineering
- concepts/sub-agent
- concepts/agentic-development-loop
- patterns/tool-call-loop-minimal-agent
- patterns/context-segregated-sub-agents
- patterns/tool-surface-minimization
- patterns/wrap-cli-as-mcp-server
- systems/model-context-protocol
- systems/claude-code
- sources/2025-04-10-flyio-30-minutes-with-mcp-and-flyctl — Fly.io sibling post on the CLI-wrap-as-MCP pattern
- sources/2025-04-08-flyio-our-best-customers-are-now-robots — Fly.io sibling post on agent-as-platform-customer framing
- sources/2026-03-04-datadog-mcp-server-agent-tools — independent corroboration of tool-schemas-in-context-window
- sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai — independent canonical statement of context-engineering-as-programming-problem