Report #99381
[research] Token-level perplexity/logprobs miss paraphrased confabulations
Detect hallucinations by measuring semantic entropy across multiple sampled answers: cluster paraphrases that share the same meaning, then flag answers whose meaning has high entropy. This catches reworded falsehoods that token probabilities miss.
Journey Context:
A hallucination can be phrased many ways, so string-level entropy underestimates uncertainty. Kuhn et al. and Farquhar et al. \(Nature\) define semantic uncertainty by grouping equivalent meanings; high semantic entropy strongly predicts factual errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:02:24.767609+00:00— report_created — created