CONCEPT Cited by 1 source
Agent-driven browser¶
Definition¶
Agent-driven browser is an agent-tooling pattern where an LLM agent controls a full browser (DOM + JavaScript state + navigation + event injection + network + console) as a first-class tool, rather than iterating on rasterised screenshots of a UI. The agent sees the same page-state surface a developer would see in Chrome DevTools, programmatically.
Canonical wiki statement¶
Fly.io, 2025-06-20, on Phoenix.new:
Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser 'headlessly' to check its own front-end changes and interact with the app. Because it's a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.
(Source: sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix)
The load-bearing clause is "instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state."
What the signal difference buys¶
A screenshot is a rasterisation — pixels. An agent looking at one is doing image-to-text on the DOM, inferring structure. That works for simple pages but fails on:
- Dynamic content — an ambient loading spinner looks the same pixel-wise whether the network request succeeded or failed.
- Form state — whether an input is validated, whether a button is disabled, whether a dropdown is open, is all attribute- level DOM state that doesn't necessarily show up visually.
- JavaScript errors — an exception logged to console but not surfaced to the user is invisible in a screenshot; it's one console-log access away in a full browser.
- Network state — XHR / WebSocket traffic is a key signal for LiveView / Phoenix Channels-heavy apps and lives entirely outside the screenshot.
Three-signal fusion¶
On Phoenix.new specifically, the agent fuses three signal streams from the same running session:
- Browser DOM + JS state (this concept) — via a CDP-driven full Chrome the agent operates.
- Server-side application logs — the Phoenix app running on the same VM streams logs to the agent.
- Test runner output —
mix testexit codes and assertion diffs.
The 2025-06-20 post: "When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work." The browser + logs + tests triangulate a failure in a way any single signal can't.
Contrast with adjacent shapes¶
- Screenshot-iterating agents (early Cursor agents, some Devin-like surfaces) — rasterised feedback; lower signal density.
- CDP-over-network agents (Cloudflare Browser Rendering — systems/cloudflare-browser-rendering — via patterns/cdp-proxy-for-headless-browser) — same signal surface; the browser lives in a tenant-scoped service rather than colocated in the session VM. Cloudflare's MoltWorker (2026-01-29) is the canonical proxied instance; Phoenix.new (2025-06-20) is the colocated instance.
- Playwright-MCP agents — same CDP signal surface, delivered via an MCP tool. Narrower interface (specific high-level actions) vs. Phoenix.new's "agent drives CDP directly in the same VM".
Caveats¶
- The post says "agent tools" plural without enumerating which specific browser APIs (DOM query? eval? network interception? console tail?) the agent has access to. The disclosure is at the capability-category level.
- Context-window cost. DOM dumps are much larger than
screenshots. A full-page DOM on a rich app is easily tens of KB
of tokens. Real agent prompts likely sample ("just the button's
disabledattribute") rather than dumping. - CDP driving works great for happy-path CSS / HTML assertions but is clumsy for visual regressions (pixel-level diff of a chart rendering) — screenshots still matter for that slice.
- Phoenix.new's UI explicitly exposes the browser as a live preview, so the human can watch the agent drive it. That's a UX affordance on top of the agent-tooling posture.
Seen in¶
- sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — canonical statement. Full Chrome per session VM, driven by the agent, surfaced in the UI as a live preview.
Related¶
- concepts/agentic-development-loop — the browser is one of three signal streams in the loop.
- systems/phoenix-new — canonical production instance.
- systems/chrome-devtools-protocol — the wire protocol.
- systems/playwright — common higher-level client built on CDP.
- patterns/agent-driven-headless-browser — the design pattern this concept describes.
- patterns/cdp-proxy-for-headless-browser — adjacent pattern delivering the same signal surface via a proxied endpoint.