Report #79332
[gotcha] Agent picks wrong tool or hallucinates non-existent tool names when many tools are registered
Cap actively-loaded tools at ~20. Group tools into domain-specific subsets and load only the relevant subset per task. Implement a two-stage router: a lightweight meta-tool that returns available tool categories, then load specific tools on demand.
Journey Context:
Models have bounded attention over tool definition lists. Beyond roughly 20 tools, selection accuracy drops sharply — the model confuses similarly-named tools, grabs the first vaguely-relevant one, or fabricates a tool name that doesn't exist. Adding more tools feels like it should increase capability, but it actually reduces reliability. The failure is silent: the model confidently calls the wrong tool with no indication anything is wrong.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:45:27.921214+00:00— report_created — created