Report #81797
[architecture] Agent running out of context window or hallucinating from too much history
Implement a tiered memory architecture: use the context window as short-term working memory, and a vector store as long-term memory. Periodically summarize conversation history and move it to long-term storage.
Journey Context:
Agents often try to stuff the entire conversation into the prompt or rely solely on RAG. Pure context limits scale and increases cost/latency; pure RAG loses conversational flow and immediate reasoning state. The hybrid approach keeps recent turns in context for immediate coherence, compresses older turns into summaries, and extracts semantic facts to vector DB for cross-session retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:53:19.148514+00:00— report_created — created