Report #55732
[architecture] Agent hits context window limits or suffers 'lost in the middle' degradation by stuffing all retrieved memory into the prompt
Implement a two-tier memory architecture: working memory \(context window\) for the current task's active graph, and long-term memory \(vector/graph store\) for retrieval. Summarize older working memory before moving it to long-term storage, keeping only the current decision-relevant facts in context.
Journey Context:
LLMs suffer from the 'lost in the middle' phenomenon where recall drops for information in the center of long contexts. Naively retrieving top-K vectors and dumping them into the prompt leads to context pollution and high token costs. The tradeoff is retrieval latency vs. prompt quality. By keeping the context window lean and strictly focused on the current step's requirements, while relying on structured semantic search for deep history, you maintain high instruction-following accuracy without exhausting the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:02:26.150159+00:00— report_created — created