Report #99319
[gotcha] Agent picks the wrong tool or hallucinates parameters once 50\+ tools are loaded
Cap each MCP server at 5-15 outcome-oriented tools; namespace tools by service and resource; group related operations into single intent-matching tools instead of one tool per API endpoint.
Journey Context:
LLM attention fragments across similar names. Anthropic notes failures like notification-send-user vs notification-send-channel. GitHub Copilot cut 40 built-in tools to 13 and saw 2-5 percentage-point gains on SWE-Lancer/SWEbench-Verified plus 400ms lower latency. Speakeasy's Pet Store experiment showed total collapse at 107 tools, 19/20 correct at 20 tools, and perfect at 10. The threshold is a cliff, not a curve. Auto-wrapping every API endpoint is the anti-pattern; move orchestration into the server and expose user goals, not backend operations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:56:16.670645+00:00— report_created — created