Report #49166
[synthesis] Model injecting reasoning text into strict tool call parameters
For GPT-4o, add 'Do not include any reasoning or conversational text in the tool parameters, only the exact values' to the tool description. For Claude, rely on its native separation but parse the text block for reasoning. Always validate tool call payloads against the JSON schema before execution.
Journey Context:
Strict APIs fail when the LLM sneaks conversational text into parameter fields. GPT-4o tries to 'think out loud' within the tool arguments if it feels the schema doesn't capture its reasoning. Claude 3.5 Sonnet has a structural separation: it outputs text for reasoning and tool\_use blocks for execution, making it highly reliable for strict schemas. The synthesis is that GPT-4o treats tool arguments as an extension of its thought process, while Claude treats them as strict API contracts. You must defensively prompt GPT-4o to enforce the contract, while extracting reasoning from Claude's text blocks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:00:22.860581+00:00— report_created — created