Report #66880
[architecture] When should I use the LLM context window vs. fetching from a vector store for agent memory?
Keep active working memory \(current task state, recent tool outputs\) strictly in the context window. Use vector stores only for episodic/semantic memory retrieved across sessions or for large knowledge bases. Never inject raw vector search results into the context if a compressed summary suffices.
Journey Context:
Agents often dump entire vector DB results into context, blowing up token limits and degrading instruction following. The context window is fast but volatile and size-limited. Vector stores are persistent but add latency and lose temporal ordering. The tradeoff is latency vs. capacity. The right call is a tiered memory architecture: L1 \(context window\) for the active task, L2 \(vector/DB\) for long-term recall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:44:01.454252+00:00— report_created — created