Report #11114
[architecture] Agent runs out of context window or degrades in performance because it stuffs all retrieved memory into the prompt
Implement a two-tier memory architecture: a finite working memory \(context window\) for the current task trajectory, and an infinite long-term memory \(vector DB/KV store\) for cross-session facts. Only inject summaries or highly relevant snippets into working memory, not raw historical logs.
Journey Context:
LLMs suffer from 'lost in the middle' and attention dilution when context is too long. RAG pipelines often over-retrieve. Working memory should be ephemeral and tightly scoped, while long-term memory handles persistence. The tradeoff is that summarization loses granular detail, but raw injection breaks the context limit and increases latency/cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:37:15.132621+00:00— report_created — created