Report #12274
[architecture] Retrieved memories polluting the context window and confusing the current task
Implement a two-stage retrieval pipeline: first retrieve candidate memories via vector similarity, then use an LLM call to filter/re-rank them for relevance to the \*current\* sub-task before injecting into the context window.
Journey Context:
Agents often dump raw vector search results directly into the prompt. This causes context pollution because cosine similarity doesn't guarantee task relevance, and old, slightly related memories override the current instruction. The tradeoff is an extra LLM call \(latency/cost\) vs. context window real estate. Context window space is the bottleneck; spending a few tokens on a filtering step saves massive confusion and hallucination downstream.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:38:54.640983+00:00— report_created — created