Skip to content

CONCEPT Cited by 1 source

Tool overlap poisons agent accuracy

A design lesson for LLM agent tool surfaces: two tools that overlap in scope make the model worse, not better. The model gets confused, calls the wrong one, and the resulting workflow degrades. The remedy is to consolidate overlapping tools into a single tool with a mode parameter rather than a family of near-identical tools.

Canonical wiki framing

From the Cloudflare Skipper / Town Lake launch post:

"Tool overlap is poison. We initially exposed every variant of every tool: three different 'fetch results' tools, two 'search' tools, several 'list' tools. The model got confused and called the wrong one. We consolidated. Now fetch_results has a mode parameter (inject / display / both) instead of three separate tools. Every tool has a single reason to exist."

This is one of four explicit design lessons named in the post, alongside "less prompting is more", "code (not metadata) captures meaning", and "memory matters."

The mechanism

Two failure modes when tools overlap:

  1. Selection confusion — when fetch_results_inject, fetch_results_display, and fetch_results_both all exist, the model has to decide which one to call before it knows the workflow's full shape. It picks based on prompt-time guesses that may be wrong, then can't recover without spending a round-trip to call a different one.
  2. Schema-token bloat — three near-identical tool schemas take three times the tokens of one tool with a mode parameter, eating context window for marginal expressivity.

Both fail in the same direction: more tool definitions, worse agent performance.

The remedy: single reason to exist

The Cloudflare framing is the architectural rule:

"Every tool has a single reason to exist."

Operational test: if two tools share most of their schema and differ only in output mode / format / scope, collapse them into one tool with a parameter. The parameter carries the disambiguation; the tool surface stays minimal.

The Skipper implementation

Before:                          After:
- fetch_results_inject           - fetch_results(mode: "inject"
- fetch_results_display                      | "display" | "both")
- fetch_results_both

Pre-consolidation: model must classify the request shape into one of three tools at the moment of calling. Post-consolidation: model picks the tool, then picks the mode — the decisions are sequenced, the schema is shared, and the parameter naturally communicates intent.

Composes with Code Mode

The lesson takes its sharpest form in Code Mode's structural argument: instead of 30+ individual MCP tools, expose two (search, execute) and let the model write JavaScript that calls the full toolset programmatically. "99.9% reduction" in token cost vs the naive tool-per-endpoint approach (canonicalised at patterns/code-mode-mcp-for-data-agent).

The same lesson at two scales:

  • Within a domain: collapse three fetch_results_* tools into one parameterised tool.
  • Across domains: collapse 30 specialised tools into two meta-tools (search / execute) plus the JavaScript runtime.

Both are expressions of "every tool has a single reason to exist" — applied at the per-tool level (one fetch operation) and the per-surface level (one MCP server).

Sibling concepts

Seen in

Last updated · 542 distilled / 1,571 read