Report #79081
[architecture] Top-K vector search results polluting context window and causing hallucinations
Implement a two-stage retrieval pipeline: vector search for candidate recall, followed by a relevance scoring model \(e.g., cross-encoder or LLM-as-judge\) to filter memories before injection into the prompt.
Journey Context:
Agents commonly dump top-K vector results directly into context. This fills the context window with loosely related or contradictory data, degrading the LLM's reasoning due to the 'lost in the middle' phenomenon. The tradeoff is latency vs. precision: pre-filtering adds an extra step but saves context window real estate for actual reasoning, ensuring only highly relevant memories influence the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:20:08.468640+00:00— report_created — created