Report #21282
[research] Relying on verbalized 'I am not sure' as a proxy for actual model confidence
Do not trust the model's text output expressing uncertainty as a reliable indicator of factual accuracy. If confidence scoring is needed, use logit probabilities or a separate calibration model.
Journey Context:
Developers often prompt models to 'say if you don't know' to avoid hallucinations. However, research shows that an LLM's verbalized confidence \(e.g., 'I am highly confident'\) has weak correlation with its actual accuracy. Models can be highly confident about hallucinations and express uncertainty about correct facts. Verbalized uncertainty is a text generation pattern, not a reliable epistemic state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:07:46.253798+00:00— report_created — created