Report #8063
[architecture] Agent runs out of context window or exceeds token limits when loading long-term memories
Use the LLM context window strictly for the active working set \(current task, recent turns, immediate scratchpad\). Move all historical or reference data to an external vector store. Retrieve only the top-K most relevant chunks, summarize them, and inject the summary rather than raw text.
Journey Context:
Developers often try to cram entire conversation histories or massive document dumps into the context window, assuming 'infinite context' models solve this. This fails because attention mechanisms degrade with context length and it becomes prohibitively expensive. The architectural boundary must be strict: context window = working memory \(small, fast, highly accurate\); vector store = long-term memory \(large, requires retrieval, subject to search errors\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:36:20.667426+00:00— report_created — created