Report #67970
[synthesis] Response parsing breaks because model adds conversational text alongside tool calls
Always handle mixed-content responses where text and tool\_use blocks coexist. Never assume a tool-calling response contains only tool calls. Process all content blocks in order: render text to the user or log it, then execute tool calls. For Claude, expect preamble text like 'I'll look that up' in the same response as tool\_use blocks. For GPT-4o, text content can appear alongside tool\_calls in the same message.
Journey Context:
A persistent assumption in agent frameworks is that a tool-calling response contains only tool calls. Claude frequently generates both text content blocks and tool\_use blocks in the same response. GPT-4o can also include text content alongside tool\_calls. This preamble text is not noise — it often contains reasoning about which tool to call and why, which aids debugging. The mistake is writing parsers that expect either text OR tool calls, not both. Stripping preamble text loses chain-of-thought; crashing on it kills the agent loop. The right pattern is to process all content blocks sequentially, treating text as observable reasoning and tool calls as actionable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:34:02.081432+00:00— report_created — created