Report #1430
[architecture] When should an agent use the LLM context window vs. a vector store for memory?
Use the context window strictly for the current task's working memory \(scratchpad\). Offload completed task artifacts and cross-session facts to a vector store. Never retrieve long-term memories directly into the middle of a complex reasoning chain without summarizing them first.
Journey Context:
Agents often try to stuff retrieved documents directly into the context, hitting token limits and degrading attention via the 'lost in the middle' effect. The context window is a high-precision, low-capacity workspace. Vector stores are high-capacity, low-precision. The tradeoff is latency vs. recall. The right call is a two-tier architecture: retrieve from the vector store, synthesize/summarize, then inject the result into the working context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T22:30:59.861723+00:00— report_created — created