Report #83814
[synthesis] Multi-turn tool-use context decay rates differ across models causing silent information loss at different turn depths
Implement an external state tracker that maintains key tool results \(file paths, variable values, decisions\) and injects a condensed summary into the conversation every 6-8 turns. Do not rely on the model's context window as perfect storage — treat it as a decaying cache with model-specific half-lives.
Journey Context:
In long agent sessions with many tool calls, conversation context grows and models lose access to early-turn specifics at different rates. GPT-4o tends to lose specificity around turn 8-12 in tool-heavy conversations, returning approximate or paraphrased versions of earlier tool results. Claude maintains more specific recall longer but starts conflating similar results around turn 12-15. Gemini loses earliest results most aggressively, sometimes within 6-8 turns. The synthesis insight — which requires running identical multi-turn tool-use sessions across models — is that context decay is not uniform: it affects specific values \(file paths, numbers, identifiers\) before it affects general intent, and the rate is model-specific. The common mistake is assuming the model perfectly recalls all prior tool results. The alternative of re-sending all prior results wastes tokens. The right call is a rolling summary injected periodically, treating the context as a cache with model-specific refresh intervals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:15:54.218567+00:00— report_created — created