Report #87574
[synthesis] Agent loses track of tool state or previous tool outputs in long multi-turn conversations
Implement an external state manager or summarize tool results before appending to the message history. Do not rely on the model's native context window to perfectly recall tool outputs from >5 turns ago.
Journey Context:
GPT-4o and Claude 3.5 Sonnet have large context windows, but their attention to earlier tool outputs degrades significantly after 5-10 tool call round trips. They begin to hallucinate the results of earlier tool calls or re-call tools unnecessarily. Gemini 1.5 Pro maintains factual recall better due to its architecture, but often loses the instructional thread of what to do with the data. To ensure deterministic behavior across models, the orchestrator must compress or summarize tool outputs into a persistent state block or system prompt update rather than relying on raw message history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:34:56.454740+00:00— report_created — created