CONCEPT Cited by 2 sources

Tool-selection accuracy¶

Tool-selection accuracy is an LLM agent's probability of picking the correct tool from its available set for a given sub-task. Empirically, it degrades as the tool inventory grows — the larger the set of plausibly-applicable tools, the more likely the model will pick a less-optimal one or call several sub-optimally (cited in Datadog's MCP post: arXiv 2411.15399).

Concrete observation (Dash)¶

Dropbox's Dash team named this failure mode in human terms: "analysis paralysis." As Dash gained integrations (Confluence, Google Docs, Jira, …) each providing its own retrieval tools (search, find-by-ID, find-by-name), the model spent increasing compute on deciding rather than acting:

"The problem wasn't broken tools; it was too many good ones. In human terms, Dash was facing analysis paralysis."

(Source: sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai)

Two failure sub-modes¶

Wrong choice. Agent picks a sub-optimal tool when a better one is available.
Redundant calls. Agent calls several tools for the same logical operation because it can't tell which is authoritative — Dash observed the model often had to call all retrieval tools "but didn't do so reliably."

Both waste concepts/agent-context-window on irrelevant tool outputs, compounding with concepts/context-rot.

Why it happens¶

Overlapping descriptions. Multiple tools with similar docstrings look the same to the LLM.
Shallow rubrics. The LLM uses short tool descriptions for selection; with more options, the selection signal-to-noise drops.
Context pressure. Every tool's description sits in context for every turn; at some point tool descriptions themselves starve the reasoning budget.

Mitigations¶

Tool-surface minimization. Keep the exposed surface small and flexible (patterns/tool-surface-minimization).
Unified retrieval tool. Collapse N app-specific retrieval tools into one index-backed retrieval tool (patterns/unified-retrieval-tool).
Specialized agents. When a capability is complex enough to need its own long tool description, move it into a dedicated sub-agent — the main agent sees one "invoke search sub-agent" tool, not the sub-agent's internal tool inventory (patterns/specialized-agent-decomposition).
Client-side tool search. Claude Code's tool-search feature and Kiro Powers load only relevant tools on demand rather than all at once (per Datadog post).

Seen in¶

sources/2025-11-17-dropbox-how-dash-uses-context-engineering-for-smarter-ai — Dash's "analysis paralysis" observation from app-integration growth; drove the move to a unified retrieval surface.
sources/2026-03-04-datadog-mcp-server-agent-tools — Datadog cites the arXiv paper and treats tool-count as one of two primary reasons for structural minimization (the other being context budget).