Report #92982
[frontier] RAG pipeline returns irrelevant chunks and agent hallucinates confident answers from bad retrieval
Add a retrieval quality gate: after retrieval, have the agent evaluate chunk relevance before generation. If relevance is below threshold, reformulate the query with different keywords or strategy and re-retrieve. Loop up to N times. Only generate when retrieval quality passes.
Journey Context:
Naive RAG does retrieve-then-generate in a single pass. When retrieval fails—which is often, especially with semantic search on domain-specific corpora—the agent either hallucinates from irrelevant context or refuses to answer. The fix is agentic RAG: give the agent control over the retrieval loop. The agent can assess whether retrieved chunks actually answer the question, and if not, try a different query formulation, switch from semantic to keyword search, or decompose the question. This adds 1-3 extra LLM calls per query but dramatically reduces hallucination-from-bad-context, which is the most dangerous failure mode because the agent sounds confident while being wrong. The tradeoff is latency vs. accuracy, and for production systems, accuracy wins. This pattern is replacing naive RAG in every serious implementation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:39:31.497524+00:00— report_created — created