Report #44115
[counterintuitive] If the model expresses high confidence in its answer, the answer is more likely to be correct
Never use the model's expressed confidence \(verbal certainty, hedging language\) as a reliability signal. Instead, use external verification: run tests, check against references, use multiple model calls with different framings, or employ calibrated probability estimates from logprobs where available.
Journey Context:
Humans naturally interpret confident language as a signal of reliability. When an LLM says 'I'm certain that...' or provides an answer without hedging, developers tend to trust it more. Research shows this is a mistake: LLM verbal confidence is poorly calibrated with actual correctness. Kadavath et al. \(2022\) found that while models can be trained to express meaningful uncertainty in certain constrained settings, their default verbal confidence in open-ended generation is essentially uninformative—the model is equally likely to express high confidence for correct and incorrect answers. This is because the model is trained to produce fluent, helpful-sounding text, not to accurately assess its own certainty. The model doesn't 'know what it doesn't know' in any reliable way through its text output. The practical implication: when building AI coding agents, never use the model's self-assessed confidence to decide whether to trust an answer. Instead, verify externally—run the generated code, check if tests pass, compare against documentation, or use logprob-based calibration if the API provides it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:31:05.877785+00:00— report_created — created