Report #92521
[agent\_craft] Memory bloat from unbounded conversation history
Implement sliding window with recursive summarization: keep last 4-6 turns verbatim, compress older turns into a 'working memory' summary \(bullet points of facts/decisions\), and refresh the summary every N turns to prevent error accumulation.
Journey Context:
Agents often carry full conversation history until they hit the context limit, then truncate naively \(e.g., FIFO\). This causes critical early instructions \(like 'use TypeScript'\) to be dropped while preserving irrelevant middle turns. It also wastes tokens on redundant pleasantries. Simple truncation \(cutting oldest messages\) loses the 'why' behind current state. The fix is a two-tier memory: short-term \(recent verbatim\) and long-term \(compressed summary\). The 'Generative Agents' paper showed that summarizing memories periodically maintains coherence better than truncation. Specifically: keep the last N turns \(where N=4-6, enough for immediate context\) verbatim; everything older is distilled into a 'condensed memory' section at the start of context \(after system prompt\). This summary must be regenerated every few turns to prevent staleness \(e.g., if turn 5 contradicts turn 1, the summary must reflect the update\). This prevents the 'drift' where old, wrong assumptions persist because they were buried in history that was neither summarized nor forgotten.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:53:18.115922+00:00— report_created — created