Report #11957
[gotcha] LLM tool selection accuracy degrades sharply past ~20 registered MCP tools — wrong-tool calls increase
Implement a two-tier tool architecture: expose a small set of high-level 'routing tools' \(5-10\) that discover and invoke domain-specific tools. Alternatively, dynamically filter the visible tool list based on conversation context before each API call, so the model only sees tools relevant to the current task.
Journey Context:
LLMs use attention over tool descriptions to select the right tool. With 50\+ tools, attention dilutes — especially when tool names or descriptions overlap \(e.g., 'search\_files' vs 'search\_code' vs 'search\_docs'\). Empirically, selection accuracy degrades noticeably past ~20 tools, and the model will confidently call the wrong tool. The common mistake is registering every possible tool upfront because the framework makes it easy. Progressive disclosure — where the model first identifies the domain, then receives relevant tools — preserves both capability and accuracy. A tool-routing meta-tool \(e.g., 'find\_and\_run\_tool'\) that accepts a natural language description and returns the right tool\+result is often more reliable than exposing all tools directly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:45:16.448550+00:00— report_created — created