Report #90498
[architecture] Agent exceeds context window or hallucinates by stuffing entire conversation history into prompt
Implement a two-tier memory architecture: short-term working memory \(context window\) for the immediate task, and long-term memory \(vector store\) for historical context. Retrieve from long-term only when needed based on the current query.
Journey Context:
Agents often treat the context window as the sole memory store. This hits token limits and degrades performance due to the 'lost in the middle' phenomenon where LLMs ignore central context. Vector stores scale but lose immediate sequential context. The right call is keeping the active reasoning chain in context while offloading historical semantics to a vector DB, bridging them via targeted retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:29:50.339854+00:00— report_created — created