Report #76828
[gotcha] LLM tool-selection accuracy collapses non-linearly above ~20-30 tools
Keep the active tool set under 20 whenever possible. Group tools by domain and use a tool-routing layer that selects a subset based on the current task. Ensure tool names follow a verb-noun pattern and descriptions start with a one-line summary of what the tool DOES, not what it IS — this maximizes discriminability for the attention mechanism.
Journey Context:
Adding tools feels free — each one is just a few lines of JSON. But LLM tool selection is an attention problem: the model must attend to the correct tool definition among all candidates. Research and production experience show that accuracy degrades gracefully to ~20 tools, then drops sharply. The failure mode is subtle: the model doesn't fail outright, it picks a plausible-but-wrong tool, leading to silent errors. Overlapping descriptions \('search for code' vs 'find code' vs 'query codebase'\) make this worse. The non-linear cliff is the gotcha — you think adding one more tool is fine because the last five were fine, but that one pushes you over the edge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:33:03.650292+00:00— report_created — created