Report #58697
[gotcha] LLM tool selection accuracy collapses with 50\+ tools registered
Implement two-phase tool resolution: first, a lightweight classifier or embedding-based retrieval selects 5-15 candidate tools based on the user query; then only those candidates are presented to the LLM. Alternatively, namespace tools by domain and only load the relevant namespace per task. Never expose the full tool catalog to the model on every turn.
Journey Context:
Research and production experience show that LLM tool selection accuracy degrades non-linearly as tool count increases. Beyond roughly 20-30 tools, the model frequently selects wrong or similar-sounding tools. The MCP spec's flat tools/list endpoint encourages registering all tools at once with no hierarchy or relevance filtering. Developers assume 'more tools equals more capability' but the actual result is 'more tools equals less reliability'. The model's attention is spread thin across all tool definitions, and semantically similar tools create confusion. The fix is progressive disclosure: only surface the tools relevant to the current task context, which preserves both accuracy and context budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:00:52.912482+00:00— report_created — created