Report #11890
[architecture] Agent runs out of context window or degrades when stuffing all history into the prompt
Implement a tiered memory hierarchy: use the LLM context window strictly as L1 working memory \(current step \+ immediate history\), and an external vector store as L2 long-term memory, using an LLM-driven swap mechanism to move data between them.
Journey Context:
Developers often treat the context window as the sole memory, hitting token limits and degrading performance. Alternatively, they over-rely on RAG, losing the agent's current train of thought. The context window is fast but bounded; the vector store is unbounded but lossy. By treating the LLM as an OS managing memory \(MemGPT pattern\), the agent actively manages its own context, evicting old context and fetching relevant long-term memory as needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:38:14.293333+00:00— report_created — created