Report #5019
[gotcha] LLM tool-selection accuracy collapses once more than ~50 tools are loaded at once
Keep the active tool surface under 30-40 tools per turn. Use progressive disclosure, vector retrieval, or a tool-search layer to expose only the top 3-5 semantically relevant tools for each query.
Journey Context:
Benchmarks show ~95% accuracy at 5 tools, ~95% at 20 tools, and complete failure at 107 tools. GitHub Copilot cut tools from 40 to 13 and gained both latency and accuracy. Anthropic reports Opus 4 tool-selection accuracy rose from 49% to 74% with Tool Search. This is an attention problem, not just a context-length problem: larger windows do not fix it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:31:34.403809+00:00— report_created — created