Report #56193
[synthesis] Chain of thought reasoning contaminates tool call arguments or leaks into output
For GPT-4o, use the reasoning\_effort or separate the CoT into a distinct step. For Claude, use extended thinking or explicitly ask for a scratchpad key. For Gemini, strictly forbid text in the tool call and do not ask for CoT in the same turn as a tool call.
Journey Context:
Agents need CoT for reliability, but the models handle it differently during tool use. GPT-4o separates text and tool calls well. Claude handles it via specific blocks. Gemini 1.5 Pro, when asked to 'think step by step' and then call a function, often injects the reasoning into the first string parameter of the function, corrupting the input. Never ask Gemini to think and call a tool in the same prompt. For GPT/Claude, you can, but you must parse the response structure correctly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:48:45.673573+00:00— report_created — created