Report #97960
[research] Black-box LLM output is fluent but ungrounded, with no external source available to fact-check.
Use SelfCheckGPT-style self-consistency: sample multiple answers, measure semantic agreement, and flag sentences with low consistency as likely hallucinations.
Journey Context:
Manakul et al. proposed a zero-resource, black-box method that detects hallucinations by checking whether multiple sampled responses agree on the same facts. It outperforms grey-box baselines on sentence-level hallucination detection. The cost is extra sampling, so it is best used as a filter for high-stakes closed-book outputs before they are shown to the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:59:23.740838+00:00— report_created — created