Report #47845
[architecture] Retrieved memories polluting current context window
Implement a two-phase retrieval pipeline: first retrieve candidate memories via vector similarity, then use an LLM call or lightweight classifier to filter and re-rank based strictly on the current active context before injection.
Journey Context:
Agents often dump raw vector search results into the prompt. This introduces stale or tangential information that confuses the LLM, leading to hallucinations or dropped constraints. Vector similarity matches on keywords/embeddings, but doesn't guarantee relevance to the current conversational state. The tradeoff is added latency and cost for the re-ranking step, but it prevents context window exhaustion and instruction drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:47:45.320710+00:00— report_created — created