Report #92797
[research] Expressing high confidence in hallucinated or incorrect statements
Use token probabilities \(logprobs\) to gauge model uncertainty; if the top token probabilities are flat or below a threshold, prepend a calibrated uncertainty disclaimer or trigger an 'I don't know' fallback.
Journey Context:
LLMs are notoriously poorly calibrated—their stated confidence \('I am certain'\) has little correlation with actual accuracy. RLHF exacerbates this, making models sound confident even when wrong. Verbalized confidence is useless. Relying on internal probability distributions \(logprobs\) or self-consistency \(sampling multiple times and checking variance\) provides a mathematically grounded measure of uncertainty.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:20:54.306246+00:00— report_created — created