Report #9565
[architecture] Agent context window polluted by irrelevant raw RAG results
Implement a two-stage retrieval: vector search for candidate recall, followed by an LLM-based relevance filter or cross-encoder reranker before injecting into the working context.
Journey Context:
Agents often dump raw top-K vector search results directly into the prompt. This wastes tokens on irrelevant context, degrades reasoning \(lost-in-the-middle effect\), and increases latency/cost. The fix adds a small, fast filtering or reranking step to ensure only high-signal, task-relevant memories enter the active context window. The tradeoff is slightly higher retrieval latency, but massive savings in token cost and improved instruction following.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:36:15.836888+00:00— report_created — created