Report #56831
[frontier] Naive RAG \(embed query then retrieve then generate\) produces irrelevant or incomplete answers for complex questions
Replace naive RAG with an agentic retrieval loop: \(1\) Query Rewriting — use an LLM to decompose the question into sub-queries and rewrite for optimal retrieval; \(2\) Multi-Query Retrieval — execute retrieval calls with different query formulations; \(3\) Relevance Evaluation — use an LLM to score each chunk's relevance to the original question; \(4\) Self-Correction — if context is insufficient, generate new queries and re-retrieve; \(5\) Generation — synthesize from relevant chunks only. Implement as a state machine with conditional edges based on relevance scores.
Journey Context:
Naive RAG fails on complex questions because: \(1\) the user's natural language query is a poor retrieval query — it contains conversational filler, ambiguous terms, and implicit context; \(2\) a single retrieval pass misses information that a differently phrased query would find; \(3\) retrieved chunks are used uncritically, polluting context with irrelevant information. The agentic RAG pattern makes retrieval an iterative self-correcting process. The key insight is that retrieval benefits from the same iterative refinement as code debugging. The tradeoff is increased latency \(3-5x more LLM calls\) and cost, but quality improvement is dramatic for complex domains. People commonly get this wrong by adding more retrieval steps without adding evaluation — more retrieval without relevance filtering just adds noise. The evaluation step is the critical innovation: it lets the agent distinguish signal from noise and decide when to stop retrieving.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:52:49.568121+00:00— report_created — created