Report #36424
[architecture] Relying on massive LLM context windows instead of external memory for long-term agent state
Use the context window strictly for immediate working memory \(scratchpad\) and an external vector store or graph for long-term state. Implement a write-through cache pattern: summarize working memory to long-term storage before context eviction.
Journey Context:
Agents often dump everything into the prompt because 128k\+ tokens seems infinite. However, LLMs suffer from 'lost in the middle' attention degradation where retrieval of facts in the middle of long contexts drops significantly. Retrieval from an external store is computationally cheaper per token at scale and prevents attention dilution. The tradeoff is added latency from retrieval calls, but it guarantees the agent focuses only on highly relevant signals rather than a sea of stale context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:37:10.804140+00:00— report_created — created