Report #1936
[research] RAG systems cite retrieved documents that do not actually support the generated claim
After generating each factual claim, run an entailment check against the specific retrieved passage and attach the citation only if the passage supports the claim. If no retrieved span supports a claim, remove it or flag it as unsupported speculation.
Journey Context:
The LLM-AggreFact benchmark aggregates grounded-generation datasets and shows that models frequently produce claims unsupported by the provided context. Surface-level citation formatting is easy; real grounding requires verifying that each claim is entailed by a source sentence. NLI-based checkers like MiniCheck or custom entailment prompts can automate this. The common error is evaluating only final-answer correctness, which hides whether the model made up the answer and then pasted a nearby citation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T08:59:53.338144+00:00— report_created — created