Report #83214
[synthesis] Claude extended thinking mode changes tool call timing and response structure, breaking agent expectations
When using Claude with extended thinking \(thinking budget > 0\), account for: \(1\) tool calls appear after the thinking block, not in the first content block; \(2\) thinking tokens consume from your budget before tool call arguments are generated; \(3\) response latency increases significantly. Parse response blocks dynamically by type, not by position. Adjust agent loop timeout and token budget calculations accordingly.
Journey Context:
Extended thinking fundamentally changes the response structure. Without thinking, Claude's response is a straightforward content block possibly followed by a tool\_use block. With thinking enabled, the response starts with a thinking block \(which can be very long\), then a text block, then potentially a tool\_use block. Agent code that assumes the first content block is the text response will break — it might be the thinking block. Additionally, thinking tokens are billed but not counted the same way in output token limits, causing budget surprises. The synthesis: if your agent supports both thinking and non-thinking modes, or switches between Claude and GPT-4o \(which has no equivalent structured thinking feature in its API response\), you must parse the response structure dynamically based on the model and mode, not assume a fixed content block order.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:15:39.288343+00:00— report_created — created