Skip to content

PATTERN Cited by 4 sources

Code generation over tool calls

Pattern

Instead of presenting an LLM agent with N individually-described MCP tool schemas and asking it to "pick a tool and fill in its parameters," convert the tool surface into a typed API in a popular language (usually TypeScript) and ask the model to write code that uses the API. A sandboxed runtime executes the code and returns the final result.

Why it works

Three load-bearing arguments from production deployments:

  1. Training-data distribution. LLMs have seen "a huge amount of real-world TypeScript but very few tool call examples," so they are measurably more accurate at writing a function call in code than filling in a JSON tool schema (Source: Agent Lee launch post).
  2. Context-window compression. A tool schema per operation inflates the context window linearly with the API surface. A typed API description compresses N operations into one schema — Cloudflare fits its ~3,000-operation API into <1,000 tokens via Code Mode (Source: sources/2026-04-13-cloudflare-building-a-cli-for-all-of-cloudflare). Same framing re-quantified against the naive baseline in the 2026-04-15 Project Think launch: "~1,000 tokens vs ~1.17 million tokens for the naive tool-per-endpoint equivalent — 99.9% reduction" (Source: sources/2026-04-15-cloudflare-project-think-building-the-next-generation-of-ai-agents). Cloudflare's internal MCP Server Portal collapsed 34 upstream GitLab tools (~15K tokens / ~7.5% of a 200K window) behind a constant-size 2-tool portal surface via the same pattern (Source: sources/2026-04-20-cloudflare-internal-ai-engineering-stack).
  3. Round-trip collapse. A multi-step task that would take N planner↔tool turns collapses into one generated script whose intermediate results stay inside the sandbox. The model returns only the final answer. Network + planner latency that scaled with N now scales with 1 (Agent Lee post).

When to reach for it

  • API surface is large (hundreds to thousands of operations) — per-tool schemas don't fit in context.
  • Tasks are frequently multi-step — chaining in code beats chaining by planner turn.
  • Output of each step is structured data the next step can consume without model re-reading.
  • A typed language describes the API well (TypeScript / Python / Go). Effectively requires a unified interface schema upstream so the typed API stays correct under API evolution.

When it doesn't fit

  • Tiny API surface (<20 operations) — the tool-schema overhead is cheap, the code-gen indirection isn't worth it.
  • Tools whose side-effects are not easy to classify deterministically from method + body (read vs write classification is load-bearing when paired with patterns/credentialed-proxy-sandbox).
  • Environments where running generated code is infeasible (no sandbox, no isolate, no tooling for streaming results back through the planner).

Prerequisites

  • A typed API description of the tool surface (TypeScript / Protobuf / OpenAPI-generated types).
  • A sandbox capable of executing the generated code with the intended runtime semantics — and, ideally, a capability-based sandbox (no ambient authority, capabilities granted explicitly) so the model-written code cannot act beyond what it was granted. Canonical wiki substrate: Cloudflare Dynamic Workers (globalOutbound: null by default; bindings grant capabilities one at a time).
  • Ideally a credential boundary the sandbox cannot cross — see patterns/credentialed-proxy-sandbox for Agent Lee's Durable-Object-based instance.

Canonical wiki instance

Cloudflare Code Mode, deployed in production as:

  • Agent Lee — dashboard agent; 2-tool MCP surface covers 3,000 API operations via Code Mode.
  • Cloudflare internal MCP Server Portal — 34 upstream tools collapsed to 2.
  • Code Mode MCP server itself — fits the entire Cloudflare API in <1,000 context tokens.
  • Project Think SDK (2026-04-15) — wires Code Mode into Tiers 1-2 of the execution ladder as the default tool-surface consumption layer; executes in Dynamic Workers with the capability- based sandbox posture.
Last updated · 200 distilled / 1,178 read