Report #65215
[frontier] RAG retrieves irrelevant chunks that pollute the agent's reasoning context and cause hallucinations
Implement two-stage retrieval: first generate hypothetical filter criteria, then retrieve, then use a lightweight LLM judge to verify relevance before inclusion
Journey Context:
Naive RAG stuffs top-k chunks into context, often including irrelevant text that confuses the agent. The 2025 frontier pattern treats retrieval as an agentic workflow: \(1\) The agent generates specific 'filter criteria' for what makes a document relevant to the current step; \(2\) Retrieval uses these criteria \(hybrid search\); \(3\) A fast 'verifier' model \(e.g., Haiku-grade\) scores each retrieved chunk for relevance and factual consistency against the known state; \(4\) Only verified chunks enter the main agent's context. This mimics human research: scan, filter for relevance, verify facts, then read deeply, preventing noise from entering the reasoning context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:57:03.904107+00:00— report_created — created