Report #98448
[research] Hard to detect subtle hallucinations in a single generated response
Sample multiple responses to the same query and check semantic consistency. Factual claims tend to be stable across samples; hallucinated claims vary or contradict each other.
Journey Context:
Manakul et al. \(2023\) proposed SelfCheckGPT, a zero-resource black-box method that detects hallucination by measuring information consistency across multiple sampled responses. This works because factual knowledge is encoded consistently in the model, while hallucinated facts are stochastic and diverge. It is especially useful when you cannot access model logits or external databases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:59:29.630590+00:00— report_created — created