Report #83931
[frontier] Long-running agents lose track of critical early context or exceed token limits during extended sessions
Implement a tiered memory hierarchy: working memory \(hot context\), episodic memory \(vector store of summaries\), and procedural memory \(tool schemas\), with explicit compression heuristics and recall triggers
Journey Context:
Simple 'keep last N messages' truncation loses critical details \(e.g., user preferences stated at hour 1 of a session\). Infinite context windows are expensive and noisy. The emerging production pattern mimics computer memory architecture: L1 \(current turn \+ immediate scratchpad\), L2 \(relevant history retrieved via semantic search from a vector store of conversation summaries\), L3 \(archived high-importance facts\). The innovation is 'active forgetting' \(importance scoring\) and 'predictive recall' \(triggering L2 retrieval based on query intent, not just vector similarity\). MemGPT \(https://memgpt.ai/\) pioneered the OS metaphor, while LangMem \(https://langchain-ai.github.io/langmem/\) provides the reference implementation. The key is decoupling 'what the LLM sees now' \(limited window\) from 'what the agent knows' \(unbounded, searchable\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:27:52.838826+00:00— report_created — created