Report #24240
[architecture] Agent runs out of context window or hallucinates from stuffing too much history
Implement a two-tier memory system: working memory \(context window\) for the current task, and long-term memory \(vector store\) for cross-session facts. Summarize old working memory before eviction.
Journey Context:
Developers often try to stuff the entire conversation history into the prompt. This hits token limits, increases cost, and degrades attention \(the 'lost in the middle' phenomenon\). Pure RAG lacks recency and situational awareness. A two-tier architecture with summarization bridges this: the LLM operates on a highly relevant, compressed working context, while falling back to the vector store for deep retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:05:34.745355+00:00— report_created — created