Report #57737
[architecture] Agent loses track of sequential instructions or hallucinates when context window is stuffed with retrieved vector memories
Implement a two-tier memory architecture: a strict Working Memory \(context window\) for the current task's active state and scratchpad, and a Long-Term Memory \(vector DB\) for archival. Only promote memories to Working Memory via targeted retrieval, and aggressively evict upon task completion.
Journey Context:
Agents often treat the LLM context window as a dumping ground for RAG results. While vector DBs excel at semantic search, they destroy temporal and sequential ordering. Stuffing the context window with 'top-k' chunks destroys the agent's attention mechanism, leading to lost instructions \(the 'lost in the middle' phenomenon\). The tradeoff is retrieval latency vs. attention quality. Keeping Working Memory lean and strictly scoped to the current execution graph ensures high reasoning fidelity, while Long-Term Memory handles scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:24:00.888655+00:00— report_created — created