CONCEPT Cited by 1 source

Agent-driven browser¶

Definition¶

Agent-driven browser is an agent-tooling pattern where an LLM agent controls a full browser (DOM + JavaScript state + navigation + event injection + network + console) as a first-class tool, rather than iterating on rasterised screenshots of a UI. The agent sees the same page-state surface a developer would see in Chrome DevTools, programmatically.

Canonical wiki statement¶

Fly.io, 2025-06-20, on Phoenix.new:

Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser 'headlessly' to check its own front-end changes and interact with the app. Because it's a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.

(Source: sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix)

The load-bearing clause is "instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state."

What the signal difference buys¶

A screenshot is a rasterisation — pixels. An agent looking at one is doing image-to-text on the DOM, inferring structure. That works for simple pages but fails on:

Dynamic content — an ambient loading spinner looks the same pixel-wise whether the network request succeeded or failed.
Form state — whether an input is validated, whether a button is disabled, whether a dropdown is open, is all attribute- level DOM state that doesn't necessarily show up visually.
JavaScript errors — an exception logged to console but not surfaced to the user is invisible in a screenshot; it's one console-log access away in a full browser.
Network state — XHR / WebSocket traffic is a key signal for LiveView / Phoenix Channels-heavy apps and lives entirely outside the screenshot.

Three-signal fusion¶

On Phoenix.new specifically, the agent fuses three signal streams from the same running session:

Browser DOM + JS state (this concept) — via a CDP-driven full Chrome the agent operates.
Server-side application logs — the Phoenix app running on the same VM streams logs to the agent.
Test runner output — mix test exit codes and assertion diffs.

The 2025-06-20 post: "When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work." The browser + logs + tests triangulate a failure in a way any single signal can't.

Contrast with adjacent shapes¶

Screenshot-iterating agents (early Cursor agents, some Devin-like surfaces) — rasterised feedback; lower signal density.
CDP-over-network agents (Cloudflare Browser Rendering — systems/cloudflare-browser-rendering — via patterns/cdp-proxy-for-headless-browser) — same signal surface; the browser lives in a tenant-scoped service rather than colocated in the session VM. Cloudflare's MoltWorker (2026-01-29) is the canonical proxied instance; Phoenix.new (2025-06-20) is the colocated instance.
Playwright-MCP agents — same CDP signal surface, delivered via an MCP tool. Narrower interface (specific high-level actions) vs. Phoenix.new's "agent drives CDP directly in the same VM".

Caveats¶

The post says "agent tools" plural without enumerating which specific browser APIs (DOM query? eval? network interception? console tail?) the agent has access to. The disclosure is at the capability-category level.
Context-window cost. DOM dumps are much larger than screenshots. A full-page DOM on a rich app is easily tens of KB of tokens. Real agent prompts likely sample ("just the button's disabled attribute") rather than dumping.
CDP driving works great for happy-path CSS / HTML assertions but is clumsy for visual regressions (pixel-level diff of a chart rendering) — screenshots still matter for that slice.
Phoenix.new's UI explicitly exposes the browser as a live preview, so the human can watch the agent drive it. That's a UX affordance on top of the agent-tooling posture.

Seen in¶

sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — canonical statement. Full Chrome per session VM, driven by the agent, surfaced in the UI as a live preview.

concepts/agentic-development-loop — the browser is one of three signal streams in the loop.
systems/phoenix-new — canonical production instance.
systems/chrome-devtools-protocol — the wire protocol.
systems/playwright — common higher-level client built on CDP.
patterns/agent-driven-headless-browser — the design pattern this concept describes.
patterns/cdp-proxy-for-headless-browser — adjacent pattern delivering the same signal surface via a proxied endpoint.