Report #30229
[architecture] Agent runs out of context window or degrades in performance by stuffing entire conversation history into the prompt
Implement a two-tier memory architecture: a short-term working context \(limited to recent N turns or summarized history\) and a long-term semantic memory \(vector store\). Evict from working context by summarizing older turns into the long-term store.
Journey Context:
LLMs suffer from the 'lost in the middle' phenomenon where performance drops if context is too long. Simply increasing context window size increases latency and cost quadratically. Vector stores solve capacity but lose sequential reasoning. The right tradeoff is keeping only the active reasoning thread in context, using the vector store as a lookup table for facts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:07:40.276192+00:00— report_created — created