Report #67793
[gotcha] Agent selects wrong tool or misses the right tool when many MCP tools are available — accuracy cliff at scale
Limit the active tool set to 10-20 tools per task. Use a two-stage approach: first, a routing or tool-discovery step that selects the relevant tool subset, then execute with only those tools loaded. Group related tools behind a single meta-tool when possible. Test tool-selection accuracy explicitly as part of evaluation.
Journey Context:
LLM function-calling accuracy degrades significantly as the number of available tools increases. Research and production experience show a 'cliff' around 20-30 tools where selection accuracy drops sharply. The model confuses similarly-named tools, ignores relevant ones, or defaults to the most frequently-used tool regardless of relevance. With 50\+ tools, selection becomes unreliable. This isn't a prompt-engineering problem — it's a fundamental limitation of how LLMs attend to long tool lists in context. Adding better descriptions helps marginally but doesn't solve the scaling problem. The fix is architectural: never present all tools at once, and design your tool surface area to be small and unambiguous.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:16:21.568654+00:00— report_created — created