Skip to content

CONCEPT Cited by 3 sources

Agentic development loop

Definition

The agentic development loop is a closed-loop LLM code generation workflow: the LLM proposes code, an execution environment runs the code, the environment's output (stdout, stderr, exit codes, test failures, compiler errors, runtime exceptions) is fed back to the LLM, and the LLM uses the feedback to generate the next attempt. The loop iterates until the code passes whatever quality bar is in place (tests green, compiler clean, task complete).

Fly.io's canonical phrasing (2025-02-07) names the three properties that distinguish this from one-shot code suggestion:

"LLM-generated code is useful in the general case if you know what you're doing. But it's ultra-useful if you can close the loop between the LLM and the execution environment (with an 'Agent' setup). […] it's a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates." (Source: sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas)

Why it works as an anti-hallucination measure

Hallucination in LLM code generation is the production of code that looks plausible but doesn't compile, doesn't pass tests, calls APIs that don't exist, or has the wrong semantics. A one-shot generator has no way to notice any of that. The agentic loop grounds generation in actual execution signal — a failing test is unambiguous evidence that this attempt is wrong, and a compiler error names the specific call that doesn't exist.

Fly's framing — "a semi-effective antidote to hallucination" — is important: the loop does not eliminate hallucination, it bounds it by a feedback signal. The LLM can still loop unproductively, and the loop can still terminate on code that passes tests but does the wrong thing (the tests can be the hallucinated part). But across many real tasks the loop converges on runnable code far more often than one-shot generation does.

The execution environment is the architectural bottleneck

The loop's utility depends on the existence of an execution environment the LLM can iterate against safely, quickly, and cleanly. Fly.io's framing:

"So, obviously, the issue here is you don't want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they'll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you'd really like to be able to do: run a closed-loop agent-y ('agentic'? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can't screw you over in any way." (Source: sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas)

The three execution-environment requirements named:

  1. Clean slate — no state the LLM can get confused by across runs; each iteration starts from a known-good baseline.
  2. Instantly spins up — the loop iterates; slow provisioning throttles the cycle and destroys the UX.
  3. Can't screw you over — blast radius bounded. The LLM's boundary issues (running destructive commands, installing random packages, editing unrelated files) must not touch the dev laptop or production.

Those three requirements are the architectural brief for Fly Machines — Firecracker micro-VMs that boot in under a second and are discardable — and for the broader disposable-VM-for-agentic-loop pattern.

  • Capability sandboxing — Cloudflare's capability-based sandbox for Dynamic Workers is framed around the same concern: "what exactly do we want this thing to be able to do?" An LLM-agent execution environment is a prime fit for capability-based posture — the agent shouldn't have any authority it doesn't demonstrably need.
  • Network egress — by default, the agent's execution environment should deny network egress or route it through a controlled proxy; package installs, API calls, and external data fetches are a common source of harm in hallucinated code. (Not discussed in the Fly post, but implied by "can't screw you over in any way.")
  • Persistent-agent remote-dev architectures are a poor substrate. VSCode Remote-SSH ships a persistent agent with FS + PTY + RPC authority onto the target host. For a dev laptop target, that is "apoplectic[ally]" wrong if LLM agents run against it (per Fly). For a disposable VM target, who cares — the VM is the sandbox. The architecture isn't wrong; it's just mismatched against the wrong target host.

Seen in

  • sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas — Fly.io's 2025-02-07 post names the agentic development loop as the motivating use-case for disposable-VM sandboxing and gives the canonical wiki phrasing ("close the loop between the LLM and the execution environment", "semi-effective antidote to hallucination", "clean-slate Linux instance that spins up instantly and can't screw you over").
  • sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — Fly.io's 2025-06-20 Phoenix.new post productises the loop with a three-signal fusion: server logs + browser DOM / JS state (via an agent-driven full Chrome) + test-runner output, all streaming into the agent's context from the same session VM the agent has root on. "When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work." Sharpens the earlier two-signal framing (compiler + tests) that was the wiki's previous default.
  • sources/2025-11-06-flyio-you-should-write-an-agent — Fly.io (Thomas Ptacek, 2025-11-06) gives the minimal-loop foundation the agentic development loop composes over: "LLM-generated code is useful in the general case if you know what you're doing" was the 2025-02-07 framing; this 2025-11-06 post strips the loop to its essentials — a Python list as context, one HTTP endpoint, a while over tool-call responses — and demonstrates it ships in ~30 lines (patterns/tool-call-loop-minimal-agent). Feeding the same loop bash "in less than 10 minutes" gives "surprisingly close to … a working coding agent" — the substrate claim underneath the agentic development loop. Also flags the open problem directly relevant here: "How best to connect agents to ground truth so they can't lie to themselves about having solved a problem too early to exit their loops." Ground-truth verifiers (compiler, tests, executable judge) are the loop's exit condition — without them the loop converges on plausible- looking hallucinations.
Last updated · 200 distilled / 1,178 read