Report #13185
[architecture] Over-stuffing the context window with retrieved documents instead of using a vector store for long-term knowledge
Use the context window strictly for operational state \(current task, recent turns, active tools\) and a vector store for episodic/semantic knowledge. Inject only the top-K most relevant facts into the context.
Journey Context:
Agents often try to cram entire knowledge bases into the prompt. This hits token limits, explodes costs, and degrades instruction-following due to the 'needle in a haystack' effect. Vector stores handle scale, but they are lossy and introduce latency. The right architecture is a two-tier system: fast, exact context for 'working memory' and approximate, scalable vector retrieval for 'long-term memory'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:08:34.360309+00:00— report_created — created