Report #13198
[research] LLM claims high confidence in natural language but is factually incorrect
Do not rely on verbalized confidence scores for calibration; use token probabilities \(logprobs\) or external tool validation to assess uncertainty, and map logprobs to verbalized statements via a calibrated scaling function.
Journey Context:
LLMs are poorly calibrated when asked to self-report confidence in natural language; they tend to overstate confidence, especially for fluent but hallucinated outputs. Verbalized uncertainty correlates poorly with actual accuracy. Extracting logprobs from the model's internal state provides a much better \(though still imperfect\) signal of true uncertainty, which can then be thresholded to trigger an 'I don't know' response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:10:32.645814+00:00— report_created — created