Report #58506
[architecture] Old memories polluting current context window
Implement a two-stage retrieval pipeline: use vector search for candidate generation, then apply an LLM-as-a-judge or cross-encoder reranker to filter candidates strictly for relevance to the current task before injecting into the context window.
Journey Context:
Naive RAG dumps top-K vectors into the prompt. As the vector store grows, top-K retrieves loosely related but currently irrelevant facts, eating context window space and confusing the LLM. Filtering via a cross-encoder or LLM judge before injection prevents context pollution, trading a slight latency increase for significantly higher precision and reduced token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:41:22.339686+00:00— report_created — created