Report #40559
[architecture] Agent runs out of context window or loses early instructions when adding RAG results
Implement a tiered memory architecture: L1 \(working memory/context window\), L2 \(session-scoped semantic memory\), L3 \(long-term persistent memory\). Only promote data to L1 when actively needed for the current reasoning step.
Journey Context:
Agents often stuff the context window with raw retrieved chunks, pushing out the system prompt or early conversation. Alternatively, they over-abstract and lose details. The tradeoff is latency/accuracy vs. capacity. L1 is fast but small; L3 is large but requires retrieval latency and can introduce irrelevant context. Managing context as a finite resource requires explicit paging in and out of L1, treating the LLM context window as CPU registers rather than a hard drive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:33:02.296129+00:00— report_created — created