Report #65479
[synthesis] Enabling extended thinking changes Claude's tool call accuracy and response structure
When using Claude with extended thinking enabled, expect higher tool call accuracy but increased latency and token cost. Update response parsers to handle the combined block structure: thinking blocks \+ text blocks \+ tool\_use blocks. GPT-4o has no equivalent feature. If porting an agent from GPT-4o to Claude with thinking enabled, the response parser must be rewritten to handle the additional block types and their ordering.
Journey Context:
Claude's extended thinking mode \(enabled via the thinking parameter\) causes the model to produce internal reasoning before responding. This significantly improves tool call accuracy—the model reasons through parameter values before generating the tool call, reducing the fabrication errors described in the parameter ambiguity entry. However, the response structure fundamentally changes: responses contain thinking blocks, text blocks, and tool\_use blocks in a format different from non-thinking responses. GPT-4o has no equivalent feature \(o1/o3 models have reasoning but are separate model families with different API contracts\). The synthesis: enabling thinking on Claude is not a transparent quality improvement—it is a structural API change that breaks response parsers designed for non-thinking Claude or GPT-4o. The accuracy improvement is real but comes with parsing, latency, and cost tradeoffs that must be handled explicitly at the architecture level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:23:19.358449+00:00— report_created — created