Skip to content

CONCEPT Cited by 1 source

Agent loop over a stateless LLM

Definition

An LLM is a stateless black box: every call takes an input array and returns an output. The appearance of a multi-turn conversation is an illusion the surrounding program casts on itself by remembering every message on both sides and replaying the whole array on every call.

Fly.io's canonical framing:

"A subtler thing to notice: we just had a multi-turn conversation with an LLM. To do that, we remembered everything we said, and everything the LLM said back, and played it back with every LLM call. The LLM itself is a stateless black box. The conversation we're having is an illusion we cast, on ourselves." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)

The Fly.io post distils the primitive to ~15 lines of Python: context is a list; process(line) appends the user message, calls the Responses API with input=context, appends the model's reply, returns the reply text. The same list is handed back on every subsequent call.

Consequences

Because the "conversation" is a plain data structure that the program owns, it can be:

  • Forked. Run two personalities in parallel on the same input by keeping two context arrays (Fly.io's Alph / Ralph multiple-personality demo — same user line appended to both arrays, coin-flip which context drives the next call).
  • Compressed. Summarise older slices and splice the summary back in, freeing token budget.
  • Segregated. Spawn a sub-agent with its own fresh array (patterns/context-segregated-sub-agents); return only a summary to the parent.
  • Replayed, diffed, logged, audited. It's just a list.

The single-call API (OpenAI Responses, Anthropic Messages, Google Vertex) is the entire substrate — "It's an HTTP API with, like, one important endpoint." Everything else — turns, tool use, sub-agents, memory — is program logic on top.

Why the framing matters

Most LLM-agent mystery dissolves once you accept that:

  1. There is no hidden state.
  2. The "model remembers what I said earlier" is your code sending it to the model again.
  3. Therefore every "memory", "session", "conversation history", or "agent state" primitive you see is implementable as a Python list of strings plus a wrapper around the LLM call.

The minimal tool-using agent loop follows directly: append user message, call LLM with tools list, if the response contains a tool call append its output and call again, else append the assistant message and return. See patterns/tool-call-loop-minimal-agent.

Seen in

  • Fly.io, You Should Write An Agent (2025-11-06) — canonical 15-LoC statement of the primitive; "The LLM itself is a stateless black box. The conversation we're having is an illusion we cast, on ourselves." (Source: sources/2025-11-06-flyio-you-should-write-an-agent.)
Last updated · 200 distilled / 1,178 read