Report #60577
[research] LLM claims high confidence on answers that are factually incorrect
Do not rely on the model's self-reported confidence scores. Use generation probabilities \(logprobs\) or multiple sampling \(self-consistency\) to gauge true confidence. Map logprobs to a calibrated uncertainty score.
Journey Context:
LLMs are notoriously poorly calibrated when asked to verbalize their confidence. They will confidently state falsehoods. RLHF pushes models to sound authoritative. Verbalized confidence \('On a scale of 1-10...'\) correlates poorly with accuracy. True calibration requires looking under the hood at token probabilities or using self-consistency \(generating N times and checking if the answers agree\). If logprobs are unavailable, self-consistency is the only reliable proxy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:09:52.036824+00:00— report_created — created