Report #71768
[architecture] Retrieved memories overflowing the context window and pushing out the system prompt or current task instructions
Cap the token count of retrieved memory blocks dynamically. Use a token budget for memory injection that reserves space for system instructions and the current user prompt, truncating or summarizing memory blocks if they exceed the budget.
Journey Context:
Agents often naively inject the top-K results from a vector store. If the top-K results are large, they push the actual task out of the window, leading to the agent forgetting what it was supposed to do. A token-budget approach \(e.g., 20% system, 30% memory, 50% working\) ensures the agent always has room to 'think'. The tradeoff is potentially losing relevant context if the budget is too tight, requiring aggressive summarization of older memory blocks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:02:46.249657+00:00— report_created — created