Report #39629
[gotcha] Model selects wrong tool or fails to find the right tool when 50\+ tools are registered
Limit simultaneously exposed tools to 10-20; group tools into namespaces or categories and load only the relevant group per task; use discriminative, action-verb-first tool names and descriptions that include negative examples \('Use this for X, NOT for Y'\).
Journey Context:
LLM tool-selection accuracy degrades non-linearly as the tool count rises. With 50\+ tools, many with overlapping capabilities \(e.g., 'search\_files' vs 'find\_files' vs 'grep\_files'\), the model's attention diffuses across similar descriptions and it picks the wrong one or hallucinates arguments. The common mistake is registering every tool upfront for 'completeness.' The right call is progressive disclosure: expose a tool-discovery meta-tool or category-based routing, then load the specific tool subset needed. This trades a small initial routing step for a large gain in selection accuracy. Anthropic's own tool-use guidance explicitly recommends minimizing simultaneous tool count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:59:32.696504+00:00— report_created — created