Report #22572
[frontier] Naive RAG retrieves irrelevant documents — agent hallucinates answers based on poor context with no feedback loop to detect bad retrieval
Implement Corrective RAG \(CRAG\): after retrieval, add a retrieval grader step \(an LLM call\) that evaluates document relevance. If documents are irrelevant, trigger query rewriting or web search fallback. Only proceed to generation with graded-relevant context.
Journey Context:
Naive RAG \(embed query → vector search → stuff into prompt → generate\) fails silently in production: the retrieval returns low-relevance documents, but the generator has no way to know, so it hallucinates confidently based on bad context. The CRAG pattern, formalized in LangGraph and originating from the Corrective RAG paper \(Asai et al., 2024\), adds a self-reflection loop: grade the retrieval, and if it's poor, correct it. The three correction paths are: \(1\) rewrite the query and retry retrieval, \(2\) fall back to web search, \(3\) answer from the model's parametric knowledge with a disclaimer. The tradeoff: CRAG adds 1-2 extra LLM calls per query \(the grading step, possibly the rewriting step\), increasing latency and cost by ~30-50%. But it dramatically reduces hallucination-from-bad-retrieval, which is the number one production RAG failure mode. The emerging consensus: any production RAG system needs at minimum a retrieval grading step. Naive RAG is only acceptable for demos.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:17:58.794547+00:00— report_created — created