Report #72207
[architecture] Agent hits context window limits or loses recent conversational state by over-relying on vector retrieval
Implement a tiered memory system: use the LLM context window for immediate working memory \(current task, recent turns\) and vector stores for long-term semantic memory. Always prioritize working memory for the current reasoning step.
Journey Context:
Developers often treat the context window and vector DB as interchangeable memory stores. Stuffing the context window with retrieved vectors is expensive and degrades instruction following. Conversely, querying a vector DB for the immediate last turn introduces latency and semantic drift \(the exact wording might be lost\). The right call is a tiered architecture: short-term working memory \(context window\) handles high-fidelity, sequential reasoning, while long-term memory \(vector DB\) handles associative recall across sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:46:56.670133+00:00— report_created — created