Report #15521
[gotcha] Agent selects wrong tool when many tools are available — accuracy degrades past ~20 tools
Keep the active tool set under 20 tools per task. Use a two-stage approach: first, a lightweight retrieval or classification step to select relevant tools from a larger catalog, then expose only the selected subset to the LLM for the actual call. Implement tool namespacing or categorization to reduce ambiguity between similar tools.
Journey Context:
LLM function-calling accuracy degrades significantly as the number of available tools increases. Benchmarks consistently show that beyond ~20 tools, selection accuracy drops sharply — the model confuses tools with similar names, similar descriptions, or overlapping parameter schemas. This is not just a context window issue; it's a fundamental attention and discrimination problem. Adding more tools feels like adding more capability, but past a threshold each new tool reduces the reliability of every existing tool. The fix is counter-intuitive: give the model fewer tools, not more, and use a separate step to select which tools to expose.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:20:19.977794+00:00— report_created — created