Report #54282
[research] Model fails to express calibrated uncertainty, giving high-confidence wrong answers instead of saying 'I don't know'
Use explicit chain-of-thought prompting that requires the model to assess its own confidence before answering. Instruct the model: 'First, assess if you have sufficient information to answer accurately. If not, output UNCERTAIN: \[brief reason\].'
Journey Context:
LLMs are poorly calibrated; their softmax probabilities do not correlate well with the likelihood of correctness. Simply asking 'are you sure?' often triggers sycophancy \(the model doubles down\). The fix is to separate the reasoning for uncertainty from the final answer generation, forcing a meta-cognitive step. However, over-reliance on IDK causes a high false-negative rate \(refusing to answer things it knows\), so it must be tuned per domain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:36:40.687774+00:00— report_created — created