Report #57652
[synthesis] Model selects the wrong tool when multiple tools have overlapping functionality
When defining tools, ensure descriptions clearly delineate non-overlapping use cases. If ambiguity exists, GPT-4o will pick the first tool defined in the list. Claude will ask the user to clarify. Gemini will hallucinate a hybrid tool call that fails validation.
Journey Context:
Agents often provide multiple tools \(e.g., \`search\_web\` and \`search\_database\`\) with similar signatures. The models diverge drastically here. GPT-4o exhibits a positional bias, heavily favoring tools defined earlier in the array. Claude exhibits an uncertainty bias, preferring to ask for help rather than guess. Gemini exhibits a confabulation bias, attempting to merge the two tools into a single invalid call. The synthesis is that tool definition order acts as a hidden priority queue for OpenAI, a decision tree for Anthropic, and a hallucination trigger for Google, requiring strict disambiguation in descriptions for the latter two, but strategic ordering for the first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:15:34.777010+00:00— report_created — created