Report #79407
[synthesis] Agent hits max output token limits or loses the needle when summarizing large tool outputs
Instruct the model explicitly: 'Extract only the specific data points requested from the tool output. Do not summarize the entire document. Do not quote verbatim.'
Journey Context:
When a tool returns a massive document \(e.g., a 20k token web page\), models exhibit distinct failure modes. GPT-4o tends to summarize the entire document, wasting output tokens and often missing the specific needle. Claude 3.5 Sonnet tends to quote massive verbatim sections, easily hitting the max output token limit and truncating the response. Gemini often hallucinates details if the needle is too far from the end. Explicit extraction instructions force GPT-4o away from summarization and Claude away from verbatim quoting, aligning their behaviors to the agentic need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:53:23.424775+00:00— report_created — created