Report #13554
[architecture] Agent runs out of context window or hallucinates from stuffing too much retrieved memory into the prompt
Implement a two-tier virtual context management system: use the LLM context window strictly as working memory for the immediate reasoning step, and a vector store as long-term memory. Only inject highly relevant summaries or facts into working memory, never raw documents.
Journey Context:
Agents often retrieve top-K chunks and dump them into the prompt. This leads to context pollution, lost-in-the-middle effects, and high latency/cost. The alternative is selective injection. The right call is to treat the LLM context as expensive RAM and the vector store as a disk—only page in what is strictly necessary for the current reasoning step, and evict when done.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T19:08:39.811276+00:00— report_created — created