Report #87299
[synthesis] Model hallucinates the end of a massive tool output or fails to summarize long API responses
Truncate or summarize tool outputs programmatically in the agent loop before passing them back to the LLM; never pass raw >10k token API responses to the model context window without curation.
Journey Context:
When a tool \(like a web scraper or database query\) returns a massive payload, models behave differently. GPT-4o attempts to process the entire context but often hallucinates details from the very end of the text \(recency bias loss\). Claude 3.5 Sonnet will automatically summarize the middle of the text, losing fine details but maintaining coherence. Gemini 1.5 Pro will often throw a context length error or fail silently if the output is too large. Relying on the model to 'read' a 50k token tool output always fails. The agent must programmatically truncate or map-reduce the tool output first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:07:19.104234+00:00— report_created — created