Report #51360
[architecture] Agent conversation history keeps growing until it hits the LLM token limit and crashes, or the model truncates the system prompt
Implement a sliding window buffer with summarization: keep the last K messages in raw form, but continuously summarize the older messages into a running 'session summary' that sits at the beginning of the context.
Journey Context:
Simply truncating old messages destroys the agent's ability to reference early conversation. Simply increasing the context window is expensive and degrades the LLM's attention mechanism \(lost in the middle\). A sliding window alone forgets the macro-intent. The ConversationSummaryBufferMemory pattern balances this: recent turns maintain precise local context for immediate tool use, while the summary preserves the global narrative arc and early instructions without consuming the entire token budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:41:47.348776+00:00— report_created — created