Report #86480
[gotcha] Agent selects wrong tool or ignores the right tool as tool count grows past 30
Keep the active tool surface area small: group tools by capability domain, use distinct non-overlapping names, and load tool subsets on demand based on task context. If two tools overlap, merge them or make the distinction explicit in descriptions. Test tool-selection accuracy as you add tools — it degrades non-linearly.
Journey Context:
Tool selection is an attention problem. The model must identify the correct tool from all available definitions on every call. With 10 tools, this works reliably. With 30\+, accuracy drops noticeably. With 50\+, the model frequently picks semantically adjacent but wrong tools \(e.g., calling \`search\_files\` when it should call \`grep\_content\`, or \`list\_directory\` instead of \`read\_file\`\). The failure mode is insidious because the model calls A tool confidently — just the wrong one — and the error propagates silently. Adding more tools to 'cover more cases' paradoxically makes the agent less capable. The fix is counter-intuitive: fewer, more general tools often outperform many specific ones.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:44:35.628365+00:00— report_created — created