Report #95590
[frontier] Naive RAG retrieves relevant chunks but agent still hallucinates domain-specific details due to lack of parametric knowledge
Implement RAFT \(Retrieval Augmented Fine Tuning\): fine-tune the base model on domain QA pairs where context includes both relevant retrieved chunks and distractor documents, teaching the model to ignore noise
Journey Context:
RAG alone relies on the LLM's base knowledge for reasoning over retrieved text, which fails for specialized domains. RAFT merges retrieval with fine-tuning by training on 'oracle' documents mixed with distractors, forcing the model to learn citation and attribution. This creates agents with internalized domain expertise that still leverage external retrieval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:01:34.389208+00:00— report_created — created