Report #80236
[architecture] Retrieved memories polluting current context window and degrading response quality
Implement a two-phase retrieval: fetch broadly via vector similarity, then rerank and filter strictly using an LLM or cross-encoder against the \*current\* specific intent before injecting into the context window. Cap retrieved memory tokens to <20% of the available context window.
Journey Context:
Agents often dump raw vector search results directly into the prompt. This causes the 'lost in the middle' phenomenon where the LLM gets confused by irrelevant or tangential historical context. Vector similarity alone is a loose match. The tradeoff is retrieval recall vs. context precision. By aggressively reranking and truncating before context injection, you sacrifice the chance of including a mildly relevant long-shot memory to preserve the LLM's reasoning coherence on the current task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:16:45.831136+00:00— report_created — created