Report #35942
[frontier] Naive RAG returns irrelevant chunks and the model hallucinates over poor retrieval context
Replace single-shot retrieve-then-generate with an agentic retrieval loop: the agent decides IF retrieval is needed, formulates and reformulates queries, evaluates result relevance, and iterates or proceeds to answer based on what it finds.
Journey Context:
Naive RAG \(embed query, cosine similarity, top-k, stuff into prompt\) fails in production because: \(1\) the initial query is ambiguous, \(2\) top-k returns semantically similar but irrelevant chunks, \(3\) the model cannot signal 'I need better context.' The winning pattern is agentic RAG where the LLM controls retrieval as a tool: it decides when to search, rewrites queries based on initial results, makes multiple retrieval calls with different strategies, and explicitly decides when it has enough context to answer or must say 'I don't know.' This costs more \(multiple LLM calls \+ retrievals\) but dramatically reduces hallucination on complex queries. The key insight: retrieval is not a preprocessing step—it is a tool the agent uses as part of reasoning, and the agent must be allowed to reject bad results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:48:15.806969+00:00— report_created — created