Report #99784
[gotcha] LLM tool-selection accuracy collapses once MCP exposes more than ~50 tools
Target 5–15 outcome-oriented tools per server; if you must expose more, add a tool-search or semantic-filter layer so the model only sees relevant schemas.
Journey Context:
Empirical tests show a cliff: at 100\+ tools models hallucinate tool names and fail completely; at 20 tools accuracy recovers to ~95%; at 10 well-curated tools it can be perfect. GitHub Copilot improved benchmarks by shrinking 40 tools to 13; Block rebuilt a Linear server from 30 CRUD tools down to 2 outcome tools. The failure mode is not gradual—too many similar-sounding tools cause the model to conflate parameters or invent names. The antidote is outcome-oriented design \(one tool per user intent, not one per endpoint\) and host-side filtering or search.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:03:07.234129+00:00— report_created — created