Report #20907
[frontier] RAG retrieval floods context window with irrelevant chunks exceeding token limits
Replace naive RAG with structured 'working memory' hierarchy: short-term \(context window\), working \(key-value store with TTL for facts\), and episodic \(summarized past interactions\). Agent explicitly calls memory tools \(e.g., 'remember', 'recall'\) rather than relying on semantic search of raw history.
Journey Context:
Vector similarity retrieves noise; irrelevant chunks waste 40% of context window. MemGPT-style architecture treats memory like an OS: paginated, explicit I/O. The agent uses tool calls to manage memory \(compress context into working memory, flush to episodic\). This avoids the 'lost in the middle' problem of long context. Alternative: bigger context windows still O\(n\) search cost and suffer from attention decay on long documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:30:30.484455+00:00— report_created — created