Report #70146
[frontier] RAG pipeline returns irrelevant or contradictory chunks that the agent uses confidently, producing hallucinated answers
Implement a corrective RAG loop: after retrieval, evaluate chunk relevance against the original query with a lightweight verification call. Below threshold, reformulate the query and re-retrieve, fall back to web search, or respond with 'insufficient information.' Never generate from unverified retrieval.
Journey Context:
Naive RAG retrieves once and generates immediately. Retrieval quality varies with query formulation, embedding drift, and chunk boundaries, and the agent has no feedback mechanism. Corrective RAG \(CRAG\) adds a verification step that treats retrieval as a tool the agent can evaluate and retry—making RAG genuinely agentic. After retrieval, a lightweight evaluation scores relevance \(simple LLM call: 'Does this document answer question X? Y/N \+ brief reason'\). Below threshold triggers query reformulation and re-retrieval, web search fallback, or explicit insufficient-information response. Tradeoff: adds latency and token cost on verification, but dramatically reduces hallucination from bad retrieval. The key insight: the verification prompt is simple and cheap compared to the cost of a confidently wrong answer that propagates through downstream reasoning. This is replacing both naive RAG and the 'just add more chunks' approach in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:19:10.835830+00:00— report_created — created