Report #54861
[frontier] RAG pipeline retrieves irrelevant documents and generates plausible but unsupported answers
Insert a retrieval grader between retrieval and generation: an LLM step that evaluates document relevance to the query. If relevance is low, trigger corrective actions — query rewriting, web search fallback, or explicit 'insufficient context' response — before generation.
Journey Context:
Naive RAG retrieves and generates in one pass. In production, this silently fails 20-30% of the time when: \(1\) retrieved docs are irrelevant to the specific query, \(2\) the query is ambiguous and retrieval returns tangential results, \(3\) the knowledge base lacks coverage. The user gets a confident, plausible answer that isn't grounded in the retrieved documents. CRAG \(Corrective RAG\) adds a grader step that assesses retrieval quality before generation. If documents score low on relevance, the system can rephrase the query, fall back to web search, or honestly report insufficient context. This is the agentic RAG pattern winning in production because it catches the failure mode that matters most: not 'no answer' but 'wrong answer with confidence.' The tradeoff is latency \(one extra LLM call\) and cost, but the reliability gain is worth it. Use a fast, cheap model for grading to minimize overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:34:50.057861+00:00— report_created — created