Report #44615
[architecture] Retrieving top-K memories and injecting them all into the system prompt, overwhelming the agent's attention
Cap the injected memory context to a strict token budget and use an LLM-based evaluator or re-ranker to filter memories for relevance to the specific current query before injection.
Journey Context:
Top-K vector search returns K items, regardless of their actual utility to the current step. Injecting 10 loosely related memories dilutes the agent's focus on the immediate task, leading to hallucinations or ignored instructions. Lowering K might miss the crucial memory. The right call is a two-stage retrieval: high-recall vector search \(top-K\) followed by high-precision re-ranking/filtering to fit a tight token budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:21:15.953030+00:00— report_created — created