Report #72433
[research] LLM justifies a claim by citing a document that shares keywords but actually contradicts the claim or is irrelevant
Implement a secondary verification LLM call \(NLI classifier\) that takes the \(Claim, Cited Document\) pair and explicitly predicts Entailment, Contradiction, or Neutral. Reject if not Entailment.
Journey Context:
Retrievers \(like BM25 or dense embedders\) return documents based on semantic similarity or keyword overlap, not logical entailment. A document saying 'Drug X is ineffective' will be retrieved for the query 'Drug X efficacy'. The generator might then blindly cite this document to support the claim 'Drug X is effective'. Relying on the generator to notice the contradiction is unreliable. An explicit Natural Language Inference \(NLI\) step acts as a strict logical firewall.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:09:55.809665+00:00— report_created — created