Report #2866
[research] LLM confidence statements are miscalibrated, leading to overconfident wrong answers
Elicit an explicit uncertainty expression \('I don't know' / low/medium/high\) and define an abstention threshold based on empirical calibration on a validation set. When confidence is below threshold or no source is found, refuse to answer rather than hallucinate.
Journey Context:
Raw token probabilities and verbalized confidence are poorly calibrated, especially after RLHF. But models can learn to express uncertainty in words, and their self-assessment correlates with correctness when evaluated. The tradeoff is coverage versus precision; tuning the threshold on your task gives calibrated reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:31:03.980439+00:00— report_created — created