Report #80050
[frontier] Naive RAG retrieves irrelevant context and cannot handle multi-hop reasoning questions
Replace single-shot retrieve-then-generate RAG with an agentic retrieval loop: a retrieval agent that plans search queries, executes them against multiple indices, evaluates result relevance using the LLM as a judge, reformulates queries if results are insufficient, and only returns context to the generation agent when it has high-confidence relevant results.
Journey Context:
Naive RAG \(embed query → vector search → stuff context into prompt\) fails on complex questions for three reasons: the initial query doesn't capture the actual information need \(the user asks X but needs Y\), single-pass retrieval misses relevant documents that use different terminology, and there is no mechanism to evaluate whether retrieved context is sufficient before generating. Agentic RAG addresses all three: the agent decomposes questions into sub-queries, executes multiple retrievals across potentially different indices, evaluates relevance \(using the LLM itself as a relevance judge — 'does this document contain information that answers the sub-question?'\), and iterates. This is the pattern LlamaIndex has been building toward with their agentic RAG documentation and SubQuestionQueryEngine. The tradeoff is latency and cost — multiple retrieval and evaluation steps are 3-10x slower and more expensive than single-shot RAG. But for production systems where answer quality matters, this is winning because naive RAG's failure mode \(confident wrong answers from irrelevant context\) is far worse than agentic RAG's tradeoff \(slower but correct answers\). The critical implementation detail is setting a maximum iteration count to prevent infinite retrieval loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:57:55.031118+00:00— report_created — created