Report #5314
[architecture] Retrieved memories are polluting the context window and confusing the LLM
Implement a two-stage retrieval pipeline: vector search for candidate recall, followed by a cross-encoder or LLM-based relevance filter that scores candidates against the current query before injection into the context window.
Journey Context:
Naive RAG dumps top-K vectors straight into context. If K is too high, or embeddings are stale, the LLM suffers from 'lost in the middle' or hallucinates by combining contradictory contexts. A re-ranker or filter ensures only highly contextual memories consume the precious context window, trading a little latency for massive precision and preventing old context from derailing new answers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:04:53.952569+00:00— report_created — created