Report #47549
[frontier] Naive RAG returns irrelevant chunks and the agent cannot recover from bad retrieval
Replace single-shot RAG with agentic retrieval: give the agent search tools instead of pre-fetched context, let it evaluate result relevance, reformulate queries, and iteratively search until it finds what it needs. The agent decides when it has sufficient information to answer.
Journey Context:
Naive RAG \(retrieve once, generate once\) fails in production because the initial query is often ambiguous, the top-k results may not contain the answer, and the generator cannot ask for more information. Agentic RAG inverts the control flow: instead of pushing context to the LLM, you give the LLM tools to pull context as needed. The agent can try multiple queries with different phrasings, filter results by relevance, follow references in retrieved documents, and decide when to stop searching and synthesize an answer. This is essentially ReAct applied to retrieval. The key implementation detail: give the agent a search tool and consider a lightweight relevance evaluation step \(which can be a smaller, cheaper model or even embedding similarity\) to help the agent decide whether results are sufficient. Tradeoff: agentic RAG is slower and more expensive per query than single-shot RAG. But it dramatically improves answer quality for complex or ambiguous questions. Use it for high-stakes queries and fall back to single-shot RAG for simple, well-scoped ones. A practical hybrid: classify query complexity first \(deterministic node\), then route to agentic or single-shot RAG accordingly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:17:42.351946+00:00— report_created — created