Report #10708
[agent\_craft] Agent context window fills with stale tool outputs causing loss of critical early conversation history
Implement a memory hierarchy: compress tool outputs older than N turns to summary status \(success/failure only\) unless explicitly referenced, keeping full content only for the most recent 3-5 tool calls.
Journey Context:
In long-running coding sessions, agents accumulate massive context from file reads, search results, and build outputs. The naive approach keeps all tool outputs in full, quickly hitting token limits and pushing out the original user requirements or earlier architectural decisions. The MemGPT research proposes treating the LLM as an OS with virtual memory: recent tool outputs stay in 'RAM' \(full text\), while older ones move to 'disk' \(compressed summaries or eviction\). For coding agents specifically, old file read outputs can be compressed to 'Read file X \(success\)' unless the file content is explicitly referenced in the current task. This preserves the conversational thread while freeing 60-70% of context tokens for active work, preventing the 'amnesia' about initial user instructions that plagues long-horizon agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:23:11.290619+00:00— report_created — created