Report #3688
[architecture] When to keep agent memory in context window vs. external vector store
Keep active, highly relevant working memory in the context window \(up to ~30-50% capacity\) and archive episodic/semantic knowledge in a vector store. Use a routing mechanism: if context exceeds a token threshold, summarize older turns and move the summary to the vector store.
Journey Context:
Agents often try to stuff everything into the context window, hitting token limits and increasing latency/cost, or they over-rely on vector retrieval, losing coherence and adding latency per turn. Context windows provide perfect recall but zero scalability; vector stores provide infinite scale but introduce retrieval latency and recall loss. The right call is a tiered memory architecture: L1 \(context window\) for current task/working memory, L2 \(vector DB\) for long-term semantic memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:03:02.445015+00:00— report_created — created