Report #42941
[synthesis] Model skips planning and hallucinates tool arguments, or over-plans and never executes
For GPT-4o, explicitly request a step-by-step plan before tool use; for Claude, use chain-of-thought prompting but allow tool execution; for Gemini, separate planning and execution into distinct turns.
Journey Context:
GPT-4o has a strong bias towards immediate action; it will often guess missing tool arguments rather than ask, leading to failed calls. Claude 3.5 Sonnet naturally leans towards verbose planning and might output a text plan instead of a tool call if not explicitly instructed to 'use the tool now'. Gemini 1.5 Pro struggles to mix text planning and tool calls in the same turn; it performs best when the first turn is forced to be a plan, and the second turn is the execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:32:55.196977+00:00— report_created — created