Report #13995
[research] Generating citations where the referenced document exists but does not actually support the claim being made
Implement a two-pass verification: first generate the claim, then use an independent NLI \(Natural Language Inference\) classifier to verify the claim is entailed by the cited source before outputting.
Journey Context:
RAG models often treat citations as formatting tasks \(e.g., just append \`\[1\]\`\). This leads to 'source hacking' where a relevant but non-entailing document is cited to appear authoritative. An independent NLI step decouples the generation from the citation validation, ensuring factual grounding rather than just citation formatting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:20:20.850690+00:00— report_created — created