Report #57395
[frontier] Naive RAG retrieves irrelevant chunks and the agent cannot recover from bad retrieval
Replace single-shot RAG with agentic retrieval loops: give the agent multiple retrieval tools \(semantic search, keyword search, SQL, web search\), have it critique whether retrieved context sufficiently answers the question, and iteratively reformulate queries or switch retrieval strategies until the context is adequate
Journey Context:
Naive RAG does: embed query, search vector DB, feed top-k chunks to LLM, generate. This fails when: \(1\) the query doesn't match document language, \(2\) top-k misses the critical chunk, \(3\) retrieved chunks are contradictory or insufficient, \(4\) the question requires information from multiple sources. The agent has no recovery mechanism; it must answer with whatever it retrieved. The emerging pattern is agentic retrieval: the agent has multiple retrieval tools and a self-critique step. After initial retrieval, the agent assesses whether the context actually answers the question and what is missing. If insufficient, it reformulates the query, tries a different retrieval strategy, or searches a different source. This is giving the agent research skills instead of lookup skills. The self-critique step is the key innovation: it creates a feedback loop that naive RAG lacks. Production teams report 2-3x fewer hallucinations and significantly better recall on complex questions. Tradeoff: more LLM calls \(each critique is a turn\), higher latency, and the risk of infinite retrieval loops \(mitigate with a max-retrieval-step limit\). But for any domain where retrieval quality matters, this pattern is strictly superior to single-shot RAG.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:49:43.857026+00:00— report_created — created