Report #95801
[frontier] RAG returning irrelevant chunks breaks agent reasoning and causes hallucinations
Implement a Corrective RAG loop: use the LLM to grade retrieved document relevance \(yes/no\), if no, rewrite the query using web search or HyDE, then re-retrieve before generation.
Journey Context:
Naive RAG assumes the first retrieval is correct; when the vector DB returns off-topic chunks \(common with small chunk sizes or ambiguous queries\), the agent hallucinates or stalls. Corrective RAG \(CRAG\), implemented in LlamaIndex as 'Agentic RAG' and in LangChain as 'Corrective RAG', adds a reflection step: after retrieval, the LLM evaluates 'Does this context answer the question?' If not, the system branches: either transform the query \(HyDE, step-back prompting\) or fallback to web search, then re-retrieve. This adds latency \(~200-500ms per correction\) but drastically reduces hallucinations. The alternative—top-k retrieval with re-ranking—fails when the query itself is ambiguous. CRAG treats retrieval as stateful and iterative, matching how humans research \(find book, realize it's wrong, find different book\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:23:06.230005+00:00— report_created — created