Skip to content

CONCEPT Cited by 2 sources

Tool-selection accuracy

Tool-selection accuracy is an LLM agent's probability of picking the correct tool from its available set for a given sub-task. Empirically, it degrades as the tool inventory grows — the larger the set of plausibly-applicable tools, the more likely the model will pick a less-optimal one or call several sub-optimally (cited in Datadog's MCP post: arXiv 2411.15399).

Concrete observation (Dash)

Dropbox's Dash team named this failure mode in human terms: "analysis paralysis." As Dash gained integrations (Confluence, Google Docs, Jira, …) each providing its own retrieval tools (search, find-by-ID, find-by-name), the model spent increasing compute on deciding rather than acting:

"The problem wasn't broken tools; it was too many good ones. In human terms, Dash was facing analysis paralysis."

(Source: sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai)

Two failure sub-modes

  1. Wrong choice. Agent picks a sub-optimal tool when a better one is available.
  2. Redundant calls. Agent calls several tools for the same logical operation because it can't tell which is authoritative — Dash observed the model often had to call all retrieval tools "but didn't do so reliably."

Both waste concepts/agent-context-window on irrelevant tool outputs, compounding with concepts/context-rot.

Why it happens

  • Overlapping descriptions. Multiple tools with similar docstrings look the same to the LLM.
  • Shallow rubrics. The LLM uses short tool descriptions for selection; with more options, the selection signal-to-noise drops.
  • Context pressure. Every tool's description sits in context for every turn; at some point tool descriptions themselves starve the reasoning budget.

Mitigations

  • Tool-surface minimization. Keep the exposed surface small and flexible (patterns/tool-surface-minimization).
  • Unified retrieval tool. Collapse N app-specific retrieval tools into one index-backed retrieval tool (patterns/unified-retrieval-tool).
  • Specialized agents. When a capability is complex enough to need its own long tool description, move it into a dedicated sub-agent — the main agent sees one "invoke search sub-agent" tool, not the sub-agent's internal tool inventory (patterns/specialized-agent-decomposition).
  • Client-side tool search. Claude Code's tool-search feature and Kiro Powers load only relevant tools on demand rather than all at once (per Datadog post).

Seen in

Last updated · 200 distilled / 1,178 read