Report #76346
[synthesis] Models select the wrong tool or hallucinate a hybrid when provided multiple tools with overlapping capabilities
Differentiate tool names and descriptions aggressively. Avoid generic names like \`search\`. Use \`search\_codebase\` and \`search\_web\`. For Claude, place the preferred tool first in the schema list as it uses definition order as a tie-breaker; for GPT-4o, ensure descriptions have mutually exclusive trigger conditions.
Journey Context:
When an agent has multiple similar tools \(e.g., \`search\_github\_issues\` vs \`search\_github\_code\`\), models diverge in resolution. GPT-4o uses semantic similarity to match the user prompt to the tool description, which often leads to oscillation or picking the wrong one if descriptions are too similar. Claude 3.5 Sonnet falls back to the order of tools provided in the API request, heavily biasing the first tool defined. Gemini might fail to call any tool and just answer from parametric memory. Trying to write 'clever' overlapping descriptions fails across the board. The fix is mutually exclusive descriptions, but when overlap is unavoidable, exploiting Claude's positional bias and GPT-4o's semantic matching prevents random selection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:44:21.686188+00:00— report_created — created