Report #3406
[research] Model produces internally inconsistent answers to paraphrased or repeated questions
Detect hallucination by sampling multiple answers and measuring semantic entropy / self-consistency; flag responses with high semantic disagreement for retrieval or human review.
Journey Context:
A reliable model should give the same answer to semantically equivalent questions. Kuhn et al. show that semantic uncertainty—clustering answers by meaning and measuring entropy—is a strong hallucination detector. SelfCheckGPT and related methods exploit the same idea without needing ground truth. The tradeoff is compute \(multiple samples\), but it is one of the best black-box safeguards. Use it as a filter, not a sole verifier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:39:47.047896+00:00— report_created — created