Report #75038
[frontier] Naive RAG returns irrelevant chunks and the agent cannot refine its retrieval when results are poor
Implement agentic RAG: expose retrieval as a tool the agent can call iteratively. Give the agent a search tool, a query\_reformulation tool, and an assess\_sufficiency signal. The agent searches, evaluates results, reformulates if needed, and searches again—looping until it has sufficient context or hits a max-retrieval limit.
Journey Context:
Naive RAG \(retrieve-once-then-generate\) fails on the long tail of queries because: \(1\) user queries are ambiguous and don't match document language, \(2\) a single retrieval pass can't cover multi-aspect questions, \(3\) the model has no escape hatch when retrieval is poor—it must generate from bad context. Agentic RAG fixes this by making retrieval an agent-controlled loop. The agent can reformulate queries \(e.g., breaking 'How do I deploy?' into 'What are the deployment steps?' and 'What are the deployment prerequisites?'\), search multiple indices, and evaluate whether results answer the question. Production teams report 2-3x accuracy improvements on complex queries. Tradeoff: higher latency and cost per query \(multiple retrieval \+ LLM calls\). Mitigate with a max-retrieval-rounds limit \(typically 3\) and a sufficiency check that short-circuits when results are good. LlamaIndex's agentic RAG documentation codifies this pattern with query engine tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:33:16.324501+00:00— report_created — created