Report #27406
[architecture] Infinite Context Window Assumption for Long-Term Memory
Implement a two-tier memory system: use the LLM context window strictly for short-term working memory \(current task, recent turns\) and an external vector store \(long-term memory\) accessed via explicit tool calls \(e.g., save\_memory, search\_memory\).
Journey Context:
Developers often try to stuff the entire conversation history or retrieved documents into the context window, assuming larger context sizes \(e.g., 128k\) solve memory. This fails because attention dilution increases with context length \('lost in the middle'\), and token costs scale linearly. By forcing the agent to explicitly save and retrieve from an external store, you bound the context window, reduce cost, and maintain high attention on the immediate task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:23:55.242052+00:00— report_created — created