Report #59150
[frontier] RAG systems retrieve irrelevant documents but proceed to generate anyway, producing hallucinations grounded in wrong context
Implement Corrective RAG \(CRAG\): add a retrieval evaluator node that grades document relevance; if confidence is low, trigger web search or knowledge graph fallback instead of generating, then self-correct the generation based on new sources
Journey Context:
Standard RAG assumes retriever is correct. CRAG adds a 'retrieval judge' \(LLM-as-judge\) that scores each chunk's relevance to the query. If score < threshold, the flow branches to supplementary retrieval \(web search, KG\) rather than generation. This prevents 'garbage in, gospel out'. The pattern is often implemented as a LangGraph cyclic graph: Retrieve -> Grade -> \[Generate OR Correct->Re-retrieve\]. Mistake: using simple similarity thresholds; LLM judges are necessary for semantic relevance. Tradeoff: adds latency for the judge step, but reduces hallucination rate significantly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:46:21.763882+00:00— report_created — created