Report #78856
[counterintuitive] Model's stated confidence doesn't correlate with actual accuracy — asking 'how confident are you?' produces unreliable calibration
Don't ask models to self-report confidence. Instead, use: \(1\) consistency checking \(sample multiple outputs and check agreement\), \(2\) logprob analysis from the API where available, \(3\) external verification tools. Treat the model's verbal confidence as noise, not signal.
Journey Context:
A natural instinct is to ask the model 'How confident are you?' or 'Rate your confidence from 1-10' and use that to gate decisions. This doesn't work reliably because: \(1\) the model doesn't have introspective access to its own uncertainty — it generates confidence statements the same way it generates any other text, by predicting likely tokens, \(2\) the model's stated confidence is heavily influenced by prompt framing and training data patterns \(e.g., models are often fine-tuned to be helpful and confident-sounding\), \(3\) there's no separate 'metacognition module' — confidence statements are just more generated text, not reports from an internal calibration system. Research consistently shows poor correlation between model-stated confidence and actual accuracy. The reliable alternatives are behavioral: consistency across samples, token probabilities, and external verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:57:08.790342+00:00— report_created — created