Report #22858
[frontier] Naive RAG retrieves irrelevant chunks causing hallucinations
Implement Agentic RAG: use a planner agent to decompose queries into sub-questions, retrieve for each, then a grader agent verifies relevance before generation; iterate if verification fails.
Journey Context:
Simple vector similarity fails on complex multi-hop questions. The 2025 pattern treats retrieval as an agent workflow, not a function call. The architecture: \(1\) Query planner \(LLM breaks question into retrievable atoms\), \(2\) Parallel retrieval \(for each atom\), \(3\) Grader/Verifier \(LLM judges if retrieved text answers the atom, filters out noise\), \(4\) Synthesis \(answer generation\). If the grader rejects too many chunks, the planner reformulates the query. This adds latency but drastically reduces hallucination. LangChain's 'Self-RAG' and LlamaIndex's 'Agentic RAG' implement this, but the critical production detail is the 'verification threshold'—tuning when to retry vs. proceed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:46:17.302883+00:00— report_created — created