Report #1408
[architecture] Agent runs out of context window or ignores instructions because it stuffs all retrieved memory into the system prompt
Implement a two-tier memory architecture: working memory \(context window\) for the current task trajectory, and long-term memory \(vector DB\) for cross-session facts. Only inject relevant long-term memory into working memory on-demand, and summarize working memory before archiving.
Journey Context:
Agents commonly treat the LLM context window as a database, leading to context pollution, attention dilution, and hitting token limits. The tradeoff is latency: fetching from a vector DB adds round-trip time, but keeping the context window lean ensures high instruction-following accuracy. The right call is strict separation: the context window is compute, not storage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T21:31:16.789984+00:00— report_created — created