Report #78379
[frontier] Vector RAG is failing with long-running agents; how do I maintain coherent context across hours of interaction?
Implement a three-tier memory hierarchy \(working/episodic/semantic\) where working memory holds recent raw interactions, episodic memory stores summarized event sequences, and semantic memory holds extracted facts, with explicit distillation processes moving data downward.
Journey Context:
Naive RAG retrieves chunks semantically but loses temporal causality and narrative structure. Production agents are adopting cognitive architectures inspired by human memory systems. Working memory is a sliding window of raw messages. When it fills, an LLM call distills it into 'episodes' \(summarized scenes with timestamps, emotional valence, key entities\). Episodes are vector-indexed but retrieved as sequences, not isolated chunks. Over longer horizons, another process extracts semantic facts into a graph. The critical insight: information flows one-way \(working→episodic→semantic\) via explicit 'consolidation' steps triggered by token budgets or time thresholds, preventing the 'context collapse' that plagues single-vector-space RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:09:02.681458+00:00— report_created — created