PATTERN Cited by 1 source

Agent-driven headless browser¶

Shape¶

Give a coding agent a full browser as a first-class tool — usually a headless Chrome driven via CDP or Playwright — and let the agent verify its own front-end changes by operating the page directly: inspecting the DOM, reading JavaScript state, triggering events, reading console output, watching network traffic. Not by taking and comparing screenshots.

Why it exists¶

Screenshot-iterating agents work for simple pages but fail on:

Dynamic state invisible to the rasteriser (validation state, disabled buttons, open menus).
JavaScript errors logged to console but not visually surfaced.
Network state (WebSocket / XHR) irrelevant to pixel output but critical for real-time apps.
Live-reload-style frameworks ( Phoenix LiveView, Phoenix Channels) where the pixel output is a consequence of correct server-push state.

Giving the agent the same programmatic surface a developer would use in Chrome DevTools closes those gaps.

Canonical instances¶

Colocated (browser lives in the agent's VM)¶

Phoenix.new (Fly.io, 2025-06-20) — every session VM ships a full Chrome the agent drives. From the post: "The Phoenix.new agent uses that browser 'headlessly' to check its own front-end changes and interact with the app. Because it's a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present." The UI simultaneously exposes the browser as a live preview for the human to watch.

Proxied (browser lives in a platform endpoint)¶

Cloudflare MoltWorker / Browser Rendering (2026-01-29) via patterns/cdp-proxy-for-headless-browser — the browser endpoint is a tenant-scoped service the agent hits over CDP-over-network. Same signal surface, different deployment shape: multi-tenant at the platform level; session-local at the tenant level.

MCP-wrapped (browser exposed through an MCP tool server)¶

Playwright MCP / browser-mcp variants — same CDP substrate surfaced as a narrower MCP tool interface. Agent calls high-level actions ("click selector X", "fill input Y") rather than dropping into raw CDP.

Trade-offs¶

Axis	Colocated browser	Proxied browser	MCP-wrapped
Latency to first byte	Intra-VM (microseconds)	Network (~10s of ms)	Network + MCP roundtrip
Tenant isolation	Per-session VM	Platform handles	Per-client session
Resource cost	Chrome per session VM	Shared Chrome fleet	Varies
Interface width	Full CDP	Full CDP	Narrower (allowlisted actions)
Best fit	Coding agent in cloud IDE	Ephemeral scraping / extraction	Cross-vendor agent frameworks

Context-window cost considerations¶

DOM dumps are much larger than screenshots. A full page DOM on a rich LiveView or React app is easily tens of KB of tokens. Real agent implementations sample — "just the attribute of this element" or "just the matches for this selector" — rather than dumping entire documents. The 2025-06-20 post doesn't disclose Phoenix.new's specific sampling strategy.

Implementation ingredients¶

Browser runtime — usually Chromium or Chrome in headless mode, though Phoenix.new's UI exposes the browser window as a visible preview too.
CDP client — raw CDP, Playwright, or a narrower MCP wrapper.
Agent tooling layer — the prompt-chain that teaches the agent which CDP operations are worth using and when.
Context-window-aware sampling — selective DOM / state extraction rather than full-page dumps.

Caveats¶

Not a substitute for visual-regression testing. Pixel-level visual diffs (chart rendering, typography) still want screenshots or dedicated image-diff tools.
Context cost is non-trivial. Bad sampling can blow context windows on a single page.
Network and auth posture inherited from the browser's host. A colocated browser has the VM's reach; a proxied browser has the platform endpoint's reach. Credentials in the loop must match the deployment shape.

Adjacent patterns¶

patterns/ephemeral-vm-as-cloud-ide — colocated browsers naturally live in the same VM as the agent.
patterns/cdp-proxy-for-headless-browser — the proxied variant.

Seen in¶

sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — canonical colocated instance.

concepts/agent-driven-browser — the concept this pattern implements.
concepts/agentic-development-loop — the closed loop this browser feeds.
systems/phoenix-new — canonical colocated production instance.
systems/chrome-devtools-protocol — the wire protocol.
systems/playwright — common higher-level CDP client.
systems/cloudflare-browser-rendering — proxied-browser platform endpoint.
patterns/cdp-proxy-for-headless-browser — sibling proxied pattern.