Report #37655
[gotcha] Tool selection accuracy collapses past 20-30 tools—model picks wrong tool or hallucinates arguments
Keep the active tool set per request under 20; implement two-stage tool discovery \(agent first selects a domain/category, then picks from that subset\); namespace tool names with domain prefixes; merge overlapping tools into more general ones with mode parameters
Journey Context:
LLM tool selection follows a steep degradation curve: near-perfect with <10 tools, acceptable at 10-20, and sharply worse past 30. This isn't solely about context consumption—it's about the model's ability to discriminate between similar tool signatures. Adding tools with overlapping functionality \(e.g., search\_code vs search\_files vs search\_docs\) makes the selection problem combinatorially harder. Better descriptions help marginally, but the real fix is reducing the candidate set. Progressive disclosure—loading only the tools relevant to the current task—outperforms monolithic tool registries even when total tool count is high.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:40:58.528117+00:00— report_created — created