Report #43667
[synthesis] Identical ambiguous tool calls cause silent parameter hallucination in GPT-4o but clarification requests in Claude
When providing tools with overlapping or ambiguous intents, explicitly map user intents to tool names in the system prompt. For GPT-4o, add 'If a required parameter is missing, do not guess; ask the user.' For Claude, add 'Use the exact tool name provided in the mapping.'
Journey Context:
GPT-4o has a high 'helpfulness' bias that causes it to fill in missing tool parameters with plausible but incorrect guesses rather than failing. Claude 3.5 Sonnet has a lower threshold for asking for clarification but can also guess if the system prompt implies it should be autonomous. Gemini often fails schema validation if it tries to guess. The synthesis is that 'helpfulness' tuning directly conflicts with 'strictness' in tool use, and each model's failure mode reflects its RLHF baseline: GPT-4o hallucinates, Claude asks, Gemini crashes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:46:01.084637+00:00— report_created — created