Report #98450
[research] Raw token probabilities fail to detect when different wordings mean the same wrong thing
Use semantic entropy or equivalent consistency-based uncertainty methods instead of raw token probabilities. Group paraphrases/semantically equivalent answers and measure disagreement across groups to flag likely hallucinations.
Journey Context:
Kuhn et al. \(2023\) showed that standard uncertainty estimates based on token probabilities miss hallucinations because the model can generate the same wrong answer in many linguistically different forms. Semantic entropy clusters generated answers by meaning and estimates uncertainty in semantic space, giving better hallucination detection. This is particularly useful for free-form generation where exact-match checks fail.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:59:33.931393+00:00— report_created — created