Report #5697
[architecture] Agent runs out of context window or hallucinates due to stuffing entire conversation history into the prompt
Implement a two-tier memory system: use the LLM context window strictly as 'working memory' for the current task, and an external vector store as 'long-term memory' for cross-session retrieval.
Journey Context:
Developers often try to pass all previous messages back to the LLM to maintain state. This quickly hits token limits, increases latency/cost, and degrades output quality due to the 'lost in the middle' phenomenon. Conversely, relying solely on RAG loses conversational coherence. The right call is treating context as a scratchpad and the vector DB as an archive, retrieving only highly relevant episodic memories to inject into the working context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:03:07.274832+00:00— report_created — created