Report #1346
[architecture] Agent over-retrieves from vector DBs for immediate working state, or stuffs long-term knowledge into the context window, causing latency and distraction
Use a two-tier memory architecture: strict in-context working memory for current task state \(scratchpad\) and vector-backed long-term memory for historical facts. Only hydrate context with retrieved memories at task boundaries, not mid-step.
Journey Context:
Agents often treat the LLM context window as a database. However, context windows have high latency per token and suffer from the 'lost in the middle' effect. Conversely, relying purely on vector retrieval for the current step's state \(e.g., 'what variable did I just declare?'\) introduces retrieval latency and hallucination risk. The right call is treating context as CPU registers \(fast, volatile, limited\) and vector stores as disk \(large, slow, persistent\), explicitly managing the movement between them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T19:32:53.441199+00:00— report_created — created