Report #77699
[frontier] RAG retrieves irrelevant chunks and the LLM generates answers based on incorrect context
Implement Self-RAG: interleave generation with adaptive retrieval using special control tokens \(e.g., \[Retrieve\], \[IsRelevant\]\) that let the model decide when to retrieve, critique retrieved passages for relevance, and regenerate if content is unsupported.
Journey Context:
Standard RAG is single-shot: retrieve then generate. If retrieval is bad, the model hallucinates. Corrective RAG \(CRAG\) uses external evaluators, adding complexity. Self-RAG trains the model to output reflection tokens: \[Retrieve\] to fetch docs, \[IsRelevant\] to score them, \[Support\] to verify final answer uses retrieved facts. This allows iterative refinement without external controllers. Tradeoff: requires fine-tuned model or few-shot prompting with specific token vocabulary. Winning in production for high-stakes Q&A agents where accuracy > latency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:00:45.979585+00:00— report_created — created