CONCEPT Cited by 5 sources

Agentic development loop¶

Definition¶

The agentic development loop is a closed-loop LLM code generation workflow: the LLM proposes code, an execution environment runs the code, the environment's output (stdout, stderr, exit codes, test failures, compiler errors, runtime exceptions) is fed back to the LLM, and the LLM uses the feedback to generate the next attempt. The loop iterates until the code passes whatever quality bar is in place (tests green, compiler clean, task complete).

Fly.io's canonical phrasing (2025-02-07) names the three properties that distinguish this from one-shot code suggestion:

"LLM-generated code is useful in the general case if you know what you're doing. But it's ultra-useful if you can close the loop between the LLM and the execution environment (with an 'Agent' setup). […] it's a semi-effective antidote to hallucination: the LLM generates the code, the agent scaffolding runs the code, the code generates errors, the agent feeds it back to the LLM, the process iterates." (Source: sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas)

Why it works as an anti-hallucination measure¶

Hallucination in LLM code generation is the production of code that looks plausible but doesn't compile, doesn't pass tests, calls APIs that don't exist, or has the wrong semantics. A one-shot generator has no way to notice any of that. The agentic loop grounds generation in actual execution signal — a failing test is unambiguous evidence that this attempt is wrong, and a compiler error names the specific call that doesn't exist.

Fly's framing — "a semi-effective antidote to hallucination" — is important: the loop does not eliminate hallucination, it bounds it by a feedback signal. The LLM can still loop unproductively, and the loop can still terminate on code that passes tests but does the wrong thing (the tests can be the hallucinated part). But across many real tasks the loop converges on runnable code far more often than one-shot generation does.

The execution environment is the architectural bottleneck¶

The loop's utility depends on the existence of an execution environment the LLM can iterate against safely, quickly, and cleanly. Fly.io's framing:

"So, obviously, the issue here is you don't want this iterative development process happening on your development laptop, because LLMs have boundary issues, and they'll iterate on your system configuration just as happily on the Git project you happen to be working in. A thing you'd really like to be able to do: run a closed-loop agent-y ('agentic'? is that what we say now) configuration for an LLM, on a clean-slate Linux instance that spins up instantly and that can't screw you over in any way." (Source: sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas)

The three execution-environment requirements named:

Clean slate — no state the LLM can get confused by across runs; each iteration starts from a known-good baseline.
Instantly spins up — the loop iterates; slow provisioning throttles the cycle and destroys the UX.
Can't screw you over — blast radius bounded. The LLM's boundary issues (running destructive commands, installing random packages, editing unrelated files) must not touch the dev laptop or production.

Those three requirements are the architectural brief for Fly Machines — Firecracker micro-VMs that boot in under a second and are discardable — and for the broader disposable-VM-for-agentic-loop pattern.

Capability sandboxing — Cloudflare's capability-based sandbox for Dynamic Workers is framed around the same concern: "what exactly do we want this thing to be able to do?" An LLM-agent execution environment is a prime fit for capability-based posture — the agent shouldn't have any authority it doesn't demonstrably need.
Network egress — by default, the agent's execution environment should deny network egress or route it through a controlled proxy; package installs, API calls, and external data fetches are a common source of harm in hallucinated code. (Not discussed in the Fly post, but implied by "can't screw you over in any way.")
Persistent-agent remote-dev architectures are a poor substrate. VSCode Remote-SSH ships a persistent agent with FS + PTY + RPC authority onto the target host. For a dev laptop target, that is "apoplectic[ally]" wrong if LLM agents run against it (per Fly). For a disposable VM target, who cares — the VM is the sandbox. The architecture isn't wrong; it's just mismatched against the wrong target host.

Seen in¶

sources/2025-02-07-flyio-vscodes-ssh-agent-is-bananas — Fly.io's 2025-02-07 post names the agentic development loop as the motivating use-case for disposable-VM sandboxing and gives the canonical wiki phrasing ("close the loop between the LLM and the execution environment", "semi-effective antidote to hallucination", "clean-slate Linux instance that spins up instantly and can't screw you over").
sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — Fly.io's 2025-06-20 Phoenix.new post productises the loop with a three-signal fusion: server logs + browser DOM / JS state (via an agent-driven full Chrome) + test-runner output, all streaming into the agent's context from the same session VM the agent has root on. "When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work." Sharpens the earlier two-signal framing (compiler + tests) that was the wiki's previous default.
sources/2025-11-06-flyio-you-should-write-an-agent — Fly.io (Thomas Ptacek, 2025-11-06) gives the minimal-loop foundation the agentic development loop composes over: "LLM-generated code is useful in the general case if you know what you're doing" was the 2025-02-07 framing; this 2025-11-06 post strips the loop to its essentials — a Python list as context, one HTTP endpoint, a while over tool-call responses — and demonstrates it ships in ~30 lines (patterns/tool-call-loop-minimal-agent). Feeding the same loop bash "in less than 10 minutes" gives "surprisingly close to … a working coding agent" — the substrate claim underneath the agentic development loop. Also flags the open problem directly relevant here: "How best to connect agents to ground truth so they can't lie to themselves about having solved a problem too early to exit their loops." Ground-truth verifiers (compiler, tests, executable judge) are the loop's exit condition — without them the loop converges on plausible- looking hallucinations.
sources/2026-04-24-atlassian-rovo-dev-driven-development — Atlassian Fireworks, built in four weeks by LLMs — production instance of the agentic development loop applied to a Firecracker-microVM-on-Kubernetes orchestrator (systems/atlassian-fireworks). Extends the loop's named surface in three ways: (1) the execution environment is a real K8s cluster shard, not a container or laptop (patterns/dev-shard-iteration-loop); (2) the agent writes its own e2e tests (concepts/ai-writes-own-tests, patterns/ai-writes-own-e2e-tests), making the test suite the ground-truth verifier rather than a compiler; (3) a pre-human adversarial sub-agent (patterns/adversarial-review-subagent) reviews the PR before any human, with human review then shifting to architecture-level judgement (patterns/pre-human-agent-review). Canonical wiki disclosure of the three-workspace parallel agent workflow (patterns/three-workspace-parallel-agent-workflow) as the single-developer dispatcher shape, and of the agent-orchestration meta-skill (concepts/agent-orchestration-skill, patterns/agent-orchestration-meta-skill) as the codebase- specific procedural-knowledge unit the agent consults for end-to-end workflows. Also canonicalises black-box validation (concepts/black-box-validation) as the process claim that makes "four weeks, entirely by LLMs" plausible — "I test outputs, not read code."
sources/2026-06-01-atlassian-how-we-cut-up-to-80-of-engineering-chores-using-ai-agents-in — Atlassian Jira team applies the loop to KTLO maintenance work, not greenfield development. Same loop primitives (specify → agent fixes → CI verifies → human reviews) but the outer loop is Jira-driven: a daily heuristic cron emits one work item per stale flag (patterns/heuristic-cron-emits-agent-work-items); engineers transition status to delegate-to-agent (patterns/jira-status-transition-triggers-agent-workflow); the agent loads a per-codebase or per-failure-category specialist skill via a fallback chain (patterns/agent-skill-with-fallback-chain, patterns/test-category-classifier-then-specialist-skill), drafts a PR, and human reviews. The loop's exit condition is the CI / merge-queue gate plus human review — the first-pass-investigator posture deliberately preserves the merge accountability. Reported outcomes: 80% reduction in flaky-test eng hours (~1 engineering week saved per month); 500+ merged PRs in 70 days from stale-flag cleanup on the Jira repo. Together with the Fireworks post, this forms the wiki's two-axis taxonomy of agentic-development-loop applications: greenfield (Fireworks, dev-shard inner loop) and KTLO maintenance (this source, work-item-driven outer loop).

systems/fly-machines — the disposable-VM substrate.
systems/vscode-remote-ssh — the remote-dev-agent mismatched against dev laptops but fine against sandboxes.
systems/phoenix-new — canonical productised three-signal instance.
systems/jira — KTLO-axis substrate (work item as prompt, status transition as trigger).
concepts/remote-development-environment — the architectural space the agentic loop now lives in.
concepts/capability-based-sandbox — complementary sandboxing posture.
concepts/agent-driven-browser — the browser-signal half of the three-signal fusion.
concepts/agent-with-root-shell — the capability-surface property of the Phoenix.new productisation.
concepts/async-agent-workflow — the outer loop over inner-loop sessions.
concepts/work-item-as-agent-prompt — KTLO-axis substrate framing.
concepts/agent-as-first-pass-investigator — KTLO-axis operational model.
concepts/ktlo-engineering-chores — KTLO-axis work category.
patterns/disposable-vm-for-agentic-loop — the architectural answer to the execution-environment requirements.
patterns/ephemeral-vm-as-cloud-ide — productisation of the disposable-VM pattern.
patterns/agent-driven-headless-browser — the browser-tool side of the three-signal loop.
patterns/jira-status-transition-triggers-agent-workflow — KTLO-axis trigger.
patterns/heuristic-cron-emits-agent-work-items — KTLO-axis upstream.
patterns/agent-skill-with-fallback-chain — KTLO-axis per-codebase skill dispatch.
patterns/test-category-classifier-then-specialist-skill — KTLO-axis per-failure-category skill dispatch.

Agentic development loop¶

Definition¶

Why it works as an anti-hallucination measure¶

The execution environment is the architectural bottleneck¶

Related architectural constraints¶

Seen in¶

Related¶