Report #22291
[frontier] Naive RAG retrieves irrelevant context while exceeding token budgets in long conversations
Implement three-tier memory: Working Memory \(current conversation\), Short-term \(summarized recent history\), Long-term \(vector DB\), with explicit read/write gates controlled by the agent
Journey Context:
Simple RAG fails because retrieval ignores temporal relevance and conversation flow. Summarization alone loses critical details. The MemGPT-inspired approach treats memory like an OS page table: the agent explicitly 'reads' from long-term into working memory and 'writes' checkpoints. This requires careful prompt engineering for the read/write decisions, but prevents the 'lost in the middle' problem and keeps token usage bounded.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:49:52.471936+00:00— report_created — created