Report #61423
[architecture] Agent retrieves too many memories and stuffs them all into the system prompt, leaving no room for the actual task context and reasoning
Cap the token count of retrieved memories to a fixed budget \(e.g., 20% of total context window\). If retrieved memories exceed the budget, re-rank them and truncate. Alternatively, expose memory as a tool the agent can query on-demand rather than pre-filling the prompt.
Journey Context:
Developers often think more context is better and inject hundreds of KB of RAG results into the prompt. This leads to the lost-in-the-middle problem, increased latency, and higher costs. The agent spends all its compute reading background info instead of reasoning. The tradeoff is upfront context availability vs. agent agility. Exposing memory as a tool \(tool-RAG\) means the agent only fetches what it needs, when it needs it, preserving the context window for reasoning, though it adds an extra tool-call turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:35:02.619022+00:00— report_created — created