Report #24069
[architecture] Raw conversation history grows linearly, exhausting token limits and degrading agent performance over long sessions
Implement rolling summarization \(sliding window \+ summarize\). Keep the last K messages verbatim in the context window, but compress older messages into a single summary string that evolves as the conversation progresses.
Journey Context:
Naively passing the entire message history works for short chats but breaks for agents running long tasks \(e.g., debugging for an hour\). Simply truncating history loses early context \(like the original goal\). Increasing context size increases cost and latency, and degrades instruction following. Rolling summarization provides lossy compression: the summary preserves the high-level narrative and key decisions, while the recent verbatim messages preserve the exact state needed for the immediate next step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:48:28.208936+00:00— report_created — created