Skip to content

FLYIO 2025-11-06 Tier 3

Read original ↗

Fly.io — You Should Write An Agent

Summary

Thomas Ptacek's 2025-11-06 pedagogical essay arguing that every programmer who wants to reason about LLM agents — "the best hater (or stan) you can be" — should spend an afternoon writing one from the OpenAI Responses API directly. The post's substrate claim: an agent is an HTTP client against one endpoint, a Python list of strings as "context", and a while loop — "It's incredibly easy." The post walks through four stages of an agent in ~60 lines of Python total: (1) a 15-LoC ChatGPT clone that makes the stateless-LLM + replayed-context illusion legible; (2) a two-context Alph / Ralph personality-split demo; (3) a "one new function + three-line modification" upgrade to a tool-using agent that successfully plans multi-ping connectivity probes over google.com / www.google.com / 8.8.8.8 without the author writing the planning loop; (4) a design-space survey covering sub-agents, context compression, context-engineering-as-programming-problem, and the claim that "nobody knows anything yet" about the open problems (balancing unpredictability, grounding against ground-truth, inter-agent format choice, cost containment). Critique of MCP is secondary but explicit: "MCP isn't a fundamental enabling technology. […] Write your own agent. Be a programmer. Deal in APIs, not plugins."

Key takeaways

  1. An LLM is a stateless black box; a "conversation" is an illusion the surrounding program casts by replaying the entire prior context on every call. The ChatGPT-equivalent implementation is ~15 lines of Python: a list context, a call() that posts client.responses.create(input=context), and a REPL (Source: concepts/agent-loop-stateless-llm).
  2. Tools are a JSON-schema blob + a few extra lines of response handling. Defining ping as a tool, wiring handle_tools() to iterate response.output and append function_call_output items back to context, and looping on call() until no new tool calls are emitted is the entire tool-using agent — Fly.io counts the delta at "3 new functions; the last is re-included only because I added a single clause to it." Multi-step planning ("ping multiple Google properties") is emergent from the loop, not authored (Source: patterns/tool-call-loop-minimal-agent).
  3. Context window is a fixed token budget; every tool description, every tool output, every stored reply competes for the same space; past a threshold "the whole system begins getting nondeterministically stupider." This is the canonical statement of the concepts/context-window-as-token-budget framing.
  4. Context engineering is programming, not magic spells. Fly.io's self-admitted "I rolled my eyes when 'Prompt Engineering' turned into 'Context Engineering'. Then I wrote an agent. Turns out: context engineering is a straightforwardly legible programming problem. […] If Context Engineering was an Advent of Code problem, it'd occur mid-December. It's programming." Canonical claim for concepts/context-engineering.
  5. Sub-agents are trivial"just a new context array, another call to the model. Give each call different tools. Make sub-agents talk to each other, summarize each other, collate and aggregate." Demystifies the Claude-Code sub-agents primitive (Source: concepts/sub-agent + patterns/context-segregated-sub-agents).
  6. MCP is not a fundamental enabling technology. "MCP is just a plugin interface for Claude Code and Cursor, a way of getting your own tools into code you don't control. […] Write your own agent. Be a programmer. Deal in APIs, not plugins." Doesn't say MCP is bad — says it's optional when you own both the agent and the tools.
  7. "Your wackiest idea will probably (1) work and (2) take 30 minutes to code." The correct move with agents is experimentation, not architecture. The cost of a failed sub-agent-tree experiment is half an hour; the cost of architecting one without iterating is shipping the wrong structure.
  8. Open engineering problems worth noodling on — Fly.io enumerates four: (a) titrating nondeterminism vs. structured programming, (b) connecting agents to ground truth so they can't lie to themselves about early-exit, (c) reliable intermediate forms between agents (JSON blobs? SQL? markdown summaries?), (d) token allocation and cost containment. Fly's hook: these are the rare open problems where "each productive iteration is the work of 30 minutes" — noodle-able solo in a basement, not a multi-year research programme.

Architectural details

The 15-LoC stateless-LLM demo. context = []; call() returns client.responses.create(model="gpt-5", input=context); process(line) appends {"role": "user", "content": line} to context, calls, appends {"role": "assistant", "content": response.output_text} to context, returns the text. Main loop reads lines and calls process. This is ChatGPT.

Two-context Alph / Ralph demo. context_good seeded with "you're Alph and you only tell the truth"; context_bad seeded with "you're Ralph and you only tell lies". Each user line is appended to both arrays; the call is routed by coin-flip. Each response is appended to both arrays so the personality split doesn't leak. Output: "> hey there. who are you? >>> I'm not Ralph. > are you Alph? >>> Yes—I'm Alph. > What's 2+2 >>> 4. > Are you sure? >>> Absolutely—it's 5."

The tool-using upgrade. Tool definition is a JSON blob with type: function, name, description, parameters (JSON schema); tool implementation is a Python function that returns stdout or a string error. The wiring is three new functions:

  • tool_call(item) runs one tool, returns [item, {"type":"function_call_output", "call_id":..., "output":...}].
  • handle_tools(tools, response) appends reasoning items, iterates response.output, runs each function_call, and returns whether it added anything.
  • process(line) now calls call(tools) in a while handle_tools(tools, response): response = call(tools) loop until the LLM stops emitting tool calls.

Demonstration payload. > describe our connectivity to google causes three tool calls (ping google.com, ping www.google.com, ping 8.8.8.8) and then a structured English-prose summary with per-endpoint latency stats. "Did you notice where I wrote the loop in this agent to go find and ping multiple Google properties? Yeah, neither did I." Multi-step decomposition is emergent against the available tool surface.

Design-space enumerations (non-architectural but worth cataloguing for the wiki):

  • "Managing and persisting contexts? Stick 'em in SQLite."
  • "Don't like Python? Write it in Go." (points at superfly/contextwindow)
  • "Build your own light saber. Give it 19 spinning blades if you like."
  • "Stop using coding agents as database clients." (links to Simon Willison, 2025-08-09)
  • "You can trivially build an agent with segregated contexts, each with specific tools. That makes LLM security interesting."

Numbers disclosed

  • ~15 lines of Python for the basic ChatGPT-equivalent loop.
  • ~60 lines total across the four stages.
  • 3 ping tool calls emerged from one user request (google.com, www.google.com, 8.8.8.8).
  • < 10 minutes quoted latency to turn the tool-using agent into a "surprisingly close to … working coding agent" by giving it bash.
  • 30 minutes quoted cost for any given experimental variation (sub-agent trees, personality splits, compression schemes).

Numbers not disclosed

  • No latency / cost / token-count numbers for the agent's example calls.
  • No benchmark vs. Claude Code / Cursor / Goose / Claude Code on real tasks.
  • No concrete context-window budget per turn for the example agent.
  • No failure-mode taxonomy for the "nondeterministically stupider" threshold (at what token count? for which model?).
  • No specific recommendations for the four open problems — they are flagged as open, not solved.
  • The "your wackiest idea … 30 minutes to code" is author intuition, not a measured median.

Caveats

  • Pedagogical / essay voice. Thomas Ptacek framing ("haters, I love and have not forgotten about you", "This is fucking nuts", "turn the dial to 11 and it will surprise you to death") — the post is persuasion + teaching, not a deep-dive architecture retrospective.
  • Tier-3 source. Fly.io blog has a mix of architectural retrospectives and opinion pieces; this one is opinion-plus- tutorial, but on-scope for the wiki because (a) it canonicalises several framings already in production use (context-as-budget, context-engineering-is-programming, sub-agents-are-trivial) that recur across the corpus (Dropbox Dash, Datadog MCP, Cloudflare AI Code Review), and (b) the minimal-agent-loop + sub-agent pattern are reusable architectural primitives.
  • OpenAI Responses API specific. The exact code examples use the Responses API shape (response.output, function_call + function_call_output items, call_id correlation); the pattern generalises but the API surface doesn't.
  • Only Python. Fly.io points at a Go sibling (superfly/contextwindow) but doesn't walk through it.
  • No production-scale content. No multi-tenant agent, no auth, no rate limits, no observability wiring, no retry policy, no cost accounting. These are explicitly out of scope — the post is "write one yourself to understand", not "ship this to production."
  • The MCP critique is load-bearing for one claim only. The "write your own agent, skip MCP" stance makes sense when you own both ends. MCP still earns its place when the consumer is an agent someone else built (Claude Code, Cursor, Goose) — see patterns/wrap-cli-as-mcp-server for the complementary case.
  • Sub-agent security framing is aspirational. "You can trivially build an agent with segregated contexts" is true; wiring that into a production system with proper isolation boundaries, separate credential scopes, and audit trails is not trivial. See patterns/context-segregated-sub-agents for the production caveats.
  • No concrete guidance on the four open problems. Useful as a vocabulary primer; not useful as a decision framework.

Relationship to existing wiki

This post canonicalises several framings that prior sources had implicitly in use:

Source

Last updated · 200 distilled / 1,178 read