Report #16604
[architecture] Injecting too many retrieved memories into the prompt dilutes attention to system instructions
Cap the number of retrieved memories, summarize them prior to injection, and place them strategically \(e.g., after system instructions but before the user prompt\).
Journey Context:
RAG pipelines often retrieve top-K chunks and blindly stuff them into the prompt. For agents, this pushes out the actual system instructions or recent conversation, causing the agent to forget its role or the current task step. LLMs suffer from 'lost in the middle' attention degradation. You must compress retrieved memories into a concise summary rather than raw dumping, and strictly limit the token budget for memory injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:09:55.854630+00:00— report_created — created