Report #24817
[frontier] Naive RAG retrieves irrelevant documents causing hallucinations grounded in wrong context
Insert a grading node after retrieval: LLM scores each document's relevance \(yes/no\); if all fail, route to web search or alternate index instead of generating from bad docs
Journey Context:
Standard RAG assumes the top-k chunks from vector search are relevant. In production, query ambiguity, embedding drift, or document updates cause 'retrieval failure' where fetched documents don't contain the answer. Generating from these documents produces confident hallucinations. The CRAG pattern \(Corrective RAG, implemented in LangGraph\) adds a 'retrieval\_grader' node: an LLM with a structured output schema \(binary score per document\) evaluates relevance. If any document passes, flow continues to generation. If all fail, the graph routes to 'fallback\_retrieval' \(e.g., web search tool or different vector index\) to get better context before generation. This 'self-critique' step adds ~200ms latency but reduces hallucination rates by 40-60% in domains with changing knowledge \(tech docs, news\). Critical: grade on 'contains answer to question' not just 'topic similarity'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:03:41.831787+00:00— report_created — created