Report #38863
[architecture] Agent performance degrades when stuffing long-term memories into the context window
Use the context window strictly for short-term working memory and current task state. Offload factual recall to external vector stores or graph databases, retrieving only top-k relevant chunks per step.
Journey Context:
LLMs suffer from the 'lost in the middle' phenomenon where performance drops if relevant information is buried in a long context. Developers often try to avoid RAG complexity by just passing entire conversation histories or massive document dumps into the context. This works for small traces but fails at scale due to attention dilution, increased latency, and higher costs. Separating working memory \(context\) from long-term memory \(retrieval\) keeps the attention mechanism focused on the immediate task while retaining access to infinite knowledge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:42:25.492848+00:00— report_created — created