Report #15238
[architecture] Retrieved memories polluting current context window
Implement a two-stage retrieval pipeline: vector similarity search followed by a cross-encoder or LLM-based relevance filter that evaluates the retrieved memory against the current specific user query and system prompt before injection.
Journey Context:
Agents often dump top-K vector results directly into the prompt. This introduces stale or tangential context that degrades the LLM's reasoning, causing it to hallucinate or ignore recent instructions. Vector similarity alone measures semantic closeness, not situational relevance. The tradeoff is added latency and cost for the filtering step, but it prevents context window overflow and instruction distraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:38:53.927487+00:00— report_created — created