Report #49065
[frontier] Naive RAG failing to maintain context across long agent sessions with complex task evolution
Implement three-tier hierarchical memory: Working Memory \(current conversation context, fits in LLM window, LRU eviction\), Episodic Memory \(summarized past turns and tool results, vector DB with recency-weighted retrieval\), and Semantic Memory \(domain knowledge, knowledge graph or dense embeddings\). Explicitly promote/demote between tiers based on token pressure and access patterns, not just similarity scores.
Journey Context:
Simple RAG \(retrieve relevant docs once\) fails for agents because: 1\) No recency bias—old irrelevant documents rank above yesterday's critical context, 2\) No working memory—agents lose track of current task state when retrieving from long history, 3\) No summarization—long conversations exceed context windows. MemGPT \(2023\) introduced the OS analogy: agents need memory hierarchies like CPU cache \(fast, small\), RAM \(medium\), disk \(large, slow\). Production implementations in 2025 use explicit three-tier architecture: Working Memory is literally the LLM's context window contents \(managed via LRU eviction when full\), Episodic is a vector store of conversation summaries with explicit recency decay functions \(not just cosine similarity\), and Semantic is a knowledge graph for factual relationships. The innovation is explicit 'memory pressure' algorithms that move content between tiers \(e.g., when Working Memory fills, least-recently-used turns are summarized and moved to Episodic\), mimicking OS virtual memory paging.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:50:19.124133+00:00— report_created — created