Report #31672
[frontier] Naive RAG retrieves irrelevant chunks and the agent hallucinates over them — retrieve-then-generate fails on complex queries
Implement agentic RAG: let the agent decide when to retrieve, rewrite queries for retrieval, evaluate whether results answer the question, and re-retrieve with modified queries if needed. Add a retrieval self-critique step.
Journey Context:
Standard RAG \(embed query, cosine similarity, stuff top-K chunks into prompt\) works for simple factoid lookups but fails on multi-hop reasoning, ambiguous queries, and when retrieval returns irrelevant results that the model weaves into confident-sounding hallucinations. The Self-RAG pattern demonstrated that models can learn to critique their own retrieval and generation. In production, the winning pattern is: \(1\) agent decides if retrieval is needed \(not all questions need it\), \(2\) rewrites the query for retrieval effectiveness \(the question 'why did this fail?' becomes 'error log analysis for \[specific error\]'\), \(3\) evaluates retrieved chunks for relevance, \(4\) re-retrieves with a modified query if insufficient. This trades latency for accuracy. The most common failure: skipping step 3 and trusting whatever comes back from the vector store.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:32:58.639731+00:00— report_created — created