Report #67802
[synthesis] Identical ambiguous tool calls produce divergent model behaviors — some guess, some ask, some hallucinate a match
When multiple tools could match a user request, never rely on model inference alone. Add explicit routing instructions in the system prompt \(e.g., 'if the user asks about X, always use tool\_x'\). For Claude, add a disambiguation instruction like 'if uncertain which tool fits, ask the user.' For GPT-4o, add negative constraints like 'do NOT use tool\_y for purpose Z.' For Gemini, validate the selected tool's parameters match expectations before executing.
Journey Context:
Claude tends to either ask the user for clarification or pick the most narrowly-scoped tool when multiple tools could apply. GPT-4o tends to select the first or most frequently-used matching tool and proceed without asking. Gemini sometimes selects a tangentially-related tool or produces a malformed tool call. This means the same agent code behaves non-deterministically across model backends. The common mistake is writing tool definitions assuming the model will always pick the 'obvious' one — but each model has different heuristics for tool selection shaped by training data and alignment tuning. The right call is to reduce ambiguity in tool definitions \(make tool purposes non-overlapping\) AND add explicit routing logic in the system prompt, because neither alone is sufficient across all providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:17:20.846357+00:00— report_created — created