PATTERN Cited by 1 source
Agent-driven headless browser¶
Shape¶
Give a coding agent a full browser as a first-class tool — usually a headless Chrome driven via CDP or Playwright — and let the agent verify its own front-end changes by operating the page directly: inspecting the DOM, reading JavaScript state, triggering events, reading console output, watching network traffic. Not by taking and comparing screenshots.
Why it exists¶
Screenshot-iterating agents work for simple pages but fail on:
- Dynamic state invisible to the rasteriser (validation state, disabled buttons, open menus).
- JavaScript errors logged to console but not visually surfaced.
- Network state (WebSocket / XHR) irrelevant to pixel output but critical for real-time apps.
- Live-reload-style frameworks ( Phoenix LiveView, Phoenix Channels) where the pixel output is a consequence of correct server-push state.
Giving the agent the same programmatic surface a developer would use in Chrome DevTools closes those gaps.
Canonical instances¶
Colocated (browser lives in the agent's VM)¶
Phoenix.new (Fly.io, 2025-06-20) — every session VM ships a full Chrome the agent drives. From the post: "The Phoenix.new agent uses that browser 'headlessly' to check its own front-end changes and interact with the app. Because it's a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present." The UI simultaneously exposes the browser as a live preview for the human to watch.
Proxied (browser lives in a platform endpoint)¶
Cloudflare MoltWorker / Browser Rendering (2026-01-29) via patterns/cdp-proxy-for-headless-browser — the browser endpoint is a tenant-scoped service the agent hits over CDP-over-network. Same signal surface, different deployment shape: multi-tenant at the platform level; session-local at the tenant level.
MCP-wrapped (browser exposed through an MCP tool server)¶
Playwright MCP / browser-mcp variants — same CDP substrate surfaced as a narrower MCP tool interface. Agent calls high-level actions ("click selector X", "fill input Y") rather than dropping into raw CDP.
Trade-offs¶
| Axis | Colocated browser | Proxied browser | MCP-wrapped |
|---|---|---|---|
| Latency to first byte | Intra-VM (microseconds) | Network (~10s of ms) | Network + MCP roundtrip |
| Tenant isolation | Per-session VM | Platform handles | Per-client session |
| Resource cost | Chrome per session VM | Shared Chrome fleet | Varies |
| Interface width | Full CDP | Full CDP | Narrower (allowlisted actions) |
| Best fit | Coding agent in cloud IDE | Ephemeral scraping / extraction | Cross-vendor agent frameworks |
Context-window cost considerations¶
DOM dumps are much larger than screenshots. A full page DOM on a rich LiveView or React app is easily tens of KB of tokens. Real agent implementations sample — "just the attribute of this element" or "just the matches for this selector" — rather than dumping entire documents. The 2025-06-20 post doesn't disclose Phoenix.new's specific sampling strategy.
Implementation ingredients¶
- Browser runtime — usually Chromium or Chrome in headless mode, though Phoenix.new's UI exposes the browser window as a visible preview too.
- CDP client — raw CDP, Playwright, or a narrower MCP wrapper.
- Agent tooling layer — the prompt-chain that teaches the agent which CDP operations are worth using and when.
- Context-window-aware sampling — selective DOM / state extraction rather than full-page dumps.
Caveats¶
- Not a substitute for visual-regression testing. Pixel-level visual diffs (chart rendering, typography) still want screenshots or dedicated image-diff tools.
- Context cost is non-trivial. Bad sampling can blow context windows on a single page.
- Network and auth posture inherited from the browser's host. A colocated browser has the VM's reach; a proxied browser has the platform endpoint's reach. Credentials in the loop must match the deployment shape.
Adjacent patterns¶
- patterns/ephemeral-vm-as-cloud-ide — colocated browsers naturally live in the same VM as the agent.
- patterns/cdp-proxy-for-headless-browser — the proxied variant.
Seen in¶
- sources/2025-06-20-flyio-phoenixnew-remote-ai-runtime-for-phoenix — canonical colocated instance.
Related¶
- concepts/agent-driven-browser — the concept this pattern implements.
- concepts/agentic-development-loop — the closed loop this browser feeds.
- systems/phoenix-new — canonical colocated production instance.
- systems/chrome-devtools-protocol — the wire protocol.
- systems/playwright — common higher-level CDP client.
- systems/cloudflare-browser-rendering — proxied-browser platform endpoint.
- patterns/cdp-proxy-for-headless-browser — sibling proxied pattern.