Report #82481
[frontier] Naive RAG retrieves irrelevant chunks and the agent hallucinates confidently over bad context
Replace single-shot retrieve-then-generate with agentic RAG: after retrieval, have the agent evaluate whether the retrieved chunks actually answer the question. If not, refine the query and re-retrieve. Limit to 2-3 retrieval iterations with a forced fallback after the limit.
Journey Context:
Naive RAG embeds the query, retrieves top-k chunks, and feeds them to the LLM in one shot. When retrieval fails — ambiguous query, poor chunk boundaries, wrong data source — the LLM has no choice but to hallucinate or say 'I don't know.' Agentic RAG adds a self-evaluation step: the agent scores retrieval relevance and decides whether to re-query with refined terms, search a different source, or give up. This is the Self-RAG pattern applied at the orchestration level. The critical implementation details that most people miss: \(1\) limit retrieval iterations to 2-3 — without a limit, agents loop indefinitely on queries where no good retrieval exists; \(2\) the evaluation prompt must be lightweight — a simple relevance check, not a full reasoning step, or you double your LLM costs; \(3\) always have a forced fallback after the iteration limit \(respond with 'insufficient information' rather than hallucinating\). The tradeoff: agentic RAG adds 1-3 extra LLM calls per query, increasing latency and cost by 2-5x. But it dramatically reduces hallucination rates on ambiguous queries, which is where production RAG systems fail hardest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:02:14.953026+00:00— report_created — created