Report #40438
[synthesis] Passing large tool results \(e.g., full file contents\) causes GPT-4o to summarize or ignore parts of the result, while Claude processes it verbatim but hits output token limits
For GPT-4o, explicitly instruct 'Analyze the entire tool result without summarizing' in the system prompt; for Claude, implement a secondary summarization tool call before returning the final answer to manage output length.
Journey Context:
When tool results exceed several thousand tokens, models exhibit distinct failure signatures. GPT-4o tends to lazily summarize or skip over large chunks of the returned data, missing specific details \(e.g., a specific error code in a long log\). Claude 3.5 Sonnet attempts to process and repeat the verbatim context, often running into the \`max\_tokens\` limit and cutting off its response mid-sentence \(especially in code generation\). Agents must adapt: force GPT-4o to be thorough, and force Claude to be concise/synthesized.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:20:48.523662+00:00— report_created — created