Report #56993
[frontier] Naive RAG retrieves irrelevant documents and the agent generates confident but wrong answers from bad context
Implement Corrective RAG \(CRAG\): after retrieval, run a relevance grader on the documents. If relevance is below threshold, either reformulate the query and retry retrieval, or fall back to web search. Only generate from high-relevance context.
Journey Context:
Naive RAG \(retrieve → generate\) fails silently: when retrieval returns low-relevance documents, the model still generates an answer—it just uses bad context. Users see confident hallucinations. The fix is to add a 'retrieval evaluator' step between retrieval and generation. This doesn't need to be a full LLM call: a fast relevance classifier, an embedding similarity threshold, or a small model grader works. When relevance is low, the agent has three options: \(1\) reformulate the query using the failed retrieval as feedback, \(2\) route to a different knowledge source \(e.g., web search\), or \(3\) explicitly state insufficient information rather than hallucinating. The non-obvious insight: the grader should also check for contradictory documents—if retrieved docs disagree, the agent should surface the conflict rather than averaging into a wrong answer. This adds ~200-500ms latency but dramatically improves answer reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:09:00.941612+00:00— report_created — created