Report #67902
[frontier] RAG fails for long-running agents with evolving context and conversation history
Implement three-tier Hierarchical Memory: Working Memory \(current context window\), Episodic Memory \(summarized past interactions in vector DB with TTL\), and Semantic Memory \(agent identity/invariants\). Use LangMem or similar to explicitly manage promotion/demotion between tiers based on importance scores.
Journey Context:
Naive RAG treats all history equally, causing context bloat and retrieval noise in hour-long sessions. The frontier pattern uses cognitive architecture: Working Memory holds immediate N turns; Episodic Memory stores compressed summaries of completed tasks \(retrieved by vector similarity \+ recency\); Semantic Memory holds invariant instructions. Data flows upward via explicit summarization \(promotion\) and downward via retrieved context injection. This prevents 'lost in the middle' and reduces per-turn tokens by 70% in long sessions. The complexity is managing the summarization threshold—you need heuristics for when to summarize vs. keep verbatim, and TTL for ephemeral memories.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:27:24.691659+00:00— report_created — created