Report #91099
[counterintuitive] Model confidence \(logprobs or verbalized certainty\) reliably indicates answer correctness
Do not use model confidence as a proxy for factual accuracy. High confidence does not mean the answer is correct, and low confidence does not mean it is wrong. If you need reliability signals, use external verification \(execution, retrieval, human review\) rather than the model's own confidence assessments.
Journey Context:
Developers often check logprobs or ask models 'how confident are you?' to gauge reliability, treating confidence like a statistical significance measure. But model confidence measures how likely a token sequence is given the training distribution, not whether it is factually correct. A fluent, common-sounding falsehood can have higher logprobs than an awkward but true statement. Verbalized confidence is even less reliable — the model generates confidence statements the same way it generates everything else, by predicting what a confident or uncertain response looks like. Research shows models have some calibration on topics they 'know well' but are poorly calibrated on their own errors — precisely the cases where a confidence signal would be most valuable. The model cannot reliably distinguish 'I know this' from 'this sounds right'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:30:24.841509+00:00— report_created — created