PATTERN Cited by 5 sources

Code generation over tool calls¶

Pattern¶

Instead of presenting an LLM agent with N individually-described MCP tool schemas and asking it to "pick a tool and fill in its parameters," convert the tool surface into a typed API in a popular language (usually TypeScript) and ask the model to write code that uses the API. A sandboxed runtime executes the code and returns the final result.

Why it works¶

Three load-bearing arguments from production deployments:

Training-data distribution. LLMs have seen "a huge amount of real-world TypeScript but very few tool call examples," so they are measurably more accurate at writing a function call in code than filling in a JSON tool schema (Source: Agent Lee launch post).
Context-window compression. A tool schema per operation inflates the context window linearly with the API surface. A typed API description compresses N operations into one schema — Cloudflare fits its ~3,000-operation API into <1,000 tokens via Code Mode (Source: sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare). Same framing re-quantified against the naive baseline in the 2026-04-15 Project Think launch: "~1,000 tokens vs ~1.17 million tokens for the naive tool-per-endpoint equivalent — 99.9% reduction" (Source: sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents). Cloudflare's internal MCP Server Portal collapsed 34 upstream GitLab tools (~15K tokens / ~7.5% of a 200K window) behind a constant-size 2-tool portal surface via the same pattern (Source: sources/2026-04-20-cloudflare-internal-ai-engineering-stack).
Round-trip collapse. A multi-step task that would take N planner↔tool turns collapses into one generated script whose intermediate results stay inside the sandbox. The model returns only the final answer. Network + planner latency that scaled with N now scales with 1 (Agent Lee post).

When to reach for it¶

API surface is large (hundreds to thousands of operations) — per-tool schemas don't fit in context.
Tasks are frequently multi-step — chaining in code beats chaining by planner turn.
Output of each step is structured data the next step can consume without model re-reading.
A typed language describes the API well (TypeScript / Python / Go). Effectively requires a unified interface schema upstream so the typed API stays correct under API evolution.

When it doesn't fit¶

Tiny API surface (<20 operations) — the tool-schema overhead is cheap, the code-gen indirection isn't worth it.
Tools whose side-effects are not easy to classify deterministically from method + body (read vs write classification is load-bearing when paired with patterns/credentialed-proxy-sandbox).
Environments where running generated code is infeasible (no sandbox, no isolate, no tooling for streaming results back through the planner).

Prerequisites¶

A typed API description of the tool surface (TypeScript / Protobuf / OpenAPI-generated types).
A sandbox capable of executing the generated code with the intended runtime semantics — and, ideally, a capability-based sandbox (no ambient authority, capabilities granted explicitly) so the model-written code cannot act beyond what it was granted. Canonical wiki substrate: Cloudflare Dynamic Workers (globalOutbound: null by default; bindings grant capabilities one at a time).
Ideally a credential boundary the sandbox cannot cross — see patterns/credentialed-proxy-sandbox for Agent Lee's Durable-Object-based instance.

Canonical wiki instance¶

Cloudflare Code Mode, deployed in production as:

Agent Lee — dashboard agent; 2-tool MCP surface covers 3,000 API operations via Code Mode.
Cloudflare internal MCP Server Portal — 34 upstream tools collapsed to 2.
Code Mode MCP server itself — fits the entire Cloudflare API in <1,000 context tokens.
Project Think SDK (2026-04-15) — wires Code Mode into Tiers 1-2 of the execution ladder as the default tool-surface consumption layer; executes in Dynamic Workers with the capability- based sandbox posture.

Flue / Agents SDK (2026-06-17)¶

The pattern is adopted by Flue on its Cloudflare target: agents write JavaScript against the workspace virtual file state API, executed via @cloudflare/codemode. Operational numbers: <10 ms cold start, $0.002 per isolate load — dramatically cheaper than container- per-tool-call. The model writes code, runs it in a fresh Dynamic Worker isolate with only granted bindings, and discards the isolate.

(Source: Agents platform post.)

systems/code-mode
systems/model-context-protocol
patterns/tool-surface-minimization — broader discipline this pattern sits inside.
patterns/specialized-agent-decomposition — complementary answer to the same context-bloat problem (subagent with narrow prompt) when code-gen isn't applicable.
concepts/tool-selection-accuracy — the degradation mode code-gen side-steps by presenting a typed API instead of N tool schemas.