Report #66479
[gotcha] LLM tool selection accuracy degrades sharply beyond 20-30 registered MCP tools
Cap exposed tools at 20-30 per request cycle. Use semantic routing to pre-filter tools relevant to the user's intent before presenting them to the LLM. Implement a two-stage approach: a lightweight classifier picks a tool category, then the LLM selects from that subset.
Journey Context:
Developers assume that if the context window can hold 50\+ tool definitions, the model can select among them. In practice, tool selection follows a power law: a few tools are called frequently, most rarely. With many tools, the model confuses similarly-named or similarly-described tools, calls the wrong one, or defaults to generic tools. The cost is not just accuracy but also latency, as the model spends compute evaluating each tool definition. This degradation is nonlinear: going from 10 to 20 tools has minimal impact, but 30 to 50 causes a measurable accuracy cliff.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:03:51.003188+00:00— report_created — created