Report #36338
[counterintuitive] Ask the model if it is confident or instruct it to say 'I don't know' when unsure
Never rely on model verbal self-assessed confidence for decision-making; use external validation, ensemble disagreement over multiple runs, or logprob-based calibration instead
Journey Context:
A common pattern is adding 'if you are not sure, say I don't know' to prompts, or asking 'how confident are you?' The assumption is that the model has introspective access to its own uncertainty. In reality, LLM verbal confidence is poorly calibrated — models will confidently assert wrong answers and hedge on correct ones. RLHF training specifically rewards confident, helpful-sounding responses, making models systematically overconfident in tone. When a model says 'I am highly confident,' this reflects the statistical pattern of confident language, not an internal uncertainty estimate. The model does not have a separate knowledge register it can query. Useful uncertainty signals come from external methods: running the prompt multiple times and checking consistency, examining logprobs if available, or — most reliably — verifying the output against an external source. Verbal confidence is performance, not assessment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:28:19.973419+00:00— report_created — created