Report #97352
[gotcha] Tool selection accuracy collapses once an agent sees more than ~50 tools
Curate ruthlessly: aim for 5–15 focused tools per server, consolidate related operations into parameterized tools, and use a gateway/search layer for large catalogs.
Journey Context:
Research and production reports show a cliff, not a gentle slope: at 107 tools models fail completely; at 10 tools performance is near-perfect. The failure mode is attention fragmentation and schema hallucination—wrong tool, or right tool with wrong arguments. Context bloat is only part of the problem; even with infinite context, too many similar-sounding tools confuse the model. The fix is not 'smarter prompting' but reducing surface area: one server, one job, and tools named by user intent rather than backend endpoints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:58:43.620816+00:00— report_created — created