Report #83771
[frontier] How to manage agent memory when context windows fill with irrelevant old messages while losing critical facts from earlier
Implement a three-tier memory system: Hot \(current context window with strict token budget\), Warm \(recent summaries in working memory\), and Cold \(vector-retrieved long-term facts\). Use explicit promotion/demotion policies based on recency and importance scores, as implemented in MemGPT.
Journey Context:
Naive RAG retrieves static documents; naive truncation drops recent or old messages arbitrarily. Both lose temporal context needed for multi-step tasks. The tiered approach mirrors computer memory hierarchies: Hot holds immediate context \(tool results, recent dialogue\), Warm holds compressed summaries of completed sub-tasks, and Cold holds episodic memories retrieved via embeddings. This enables hour-long task completion within limited context windows. The alternative of 'infinite context windows' \(Gemini 1M\+\) exists but is expensive and suffers from retrieval attention issues. The tradeoff is management overhead: deciding what to evict, handling promotion latency, and the risk of losing 'working memory' if summarization is lossy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:11:48.347576+00:00— report_created — created