Report #79181
[synthesis] Model selects wrong tool when multiple tools have overlapping capabilities
Invest heavily in tool descriptions with disambiguation: add negative examples \('Use this for local file search, NOT for web search'\), include example invocations, and keep tool count per request under 10 for GPT-4o and under 20 for Claude. When possible, use tool\_choice to constrain the candidate set before dispatch.
Journey Context:
When an agent exposes tools like 'search\_codebase' and 'search\_web', and the user says 'search for the auth pattern', models resolve the ambiguity differently. Claude relies heavily on tool description text for selection—if descriptions are vague, it often asks for clarification \(stalling the agent loop\). GPT-4o tends to pick the first-listed tool that partially matches, and its selection accuracy degrades faster as tool count increases. The cross-model insight: tool descriptions are the primary selection signal for both, but they need different strategies. Claude benefits from explicit disambiguation text and example invocations in descriptions. GPT-4o benefits from a smaller candidate set \(use tool\_choice or dynamically filter tools before each call\). Both benefit from negative examples in descriptions. This is a case where the same root cause \(ambiguous tools\) produces different failure signatures per model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:30:09.561367+00:00— report_created — created