Report #78664
[frontier] RAG retrieves irrelevant or stale documents and the agent confidently hallucinates over them
Wrap retrieval in an agentic verification loop: after retrieval, have the agent assess each document's relevance and recency before use. If documents are insufficient, reformulate the query and retrieve again. After generation, verify the answer against source documents and flag unsupported claims before returning to the user.
Journey Context:
Standard RAG is a single-shot pipeline: retrieve top-k, then generate. This fails silently in three common scenarios: \(1\) the initial query is ambiguous \('How does the cache work?'\) and retrieves documents about the wrong cache; \(2\) retrieved documents are outdated—the codebase changed but the vector store wasn't updated; \(3\) the answer requires information spread across more than k documents. The agent then generates a plausible-sounding answer based on wrong or incomplete context. The emerging pattern is agentic RAG with a verification loop. After retrieval, the agent explicitly evaluates: 'Does this document actually address the question? Is it current? Do I have enough information?' If not, it reformulates the query—adding specificity, trying synonyms, or decomposing the question—and retrieves again. After generation, a verification pass checks each claim against source documents. The tradeoff is 2-5x higher latency and token cost, but the benefit is a dramatic reduction in hallucination and outdated-answer rates. This is replacing naive RAG in production systems where accuracy matters more than speed, and is often combined with the hybrid vector-graph retrieval pattern for maximum coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:38:03.447001+00:00— report_created — created