Report #29226
[gotcha] AI's expressed confidence does not reliably predict answer correctness \(calibration gap\)
Never use the AI's hedging or confidence language as a signal of answer accuracy in UX decisions. If you need confidence signals, use explicit calibration techniques \(logprob-based scoring, self-consistency checks, multi-sample verification\). In UI, avoid visual confidence indicators derived from the model's tone—instead, surface verification status based on external validation \(tests passing, sources cited, cross-checks\).
Journey Context:
Users naturally interpret confident language \('definitely,' 'clearly'\) as a signal of accuracy and hedging language \('perhaps,' 'might'\) as a signal of uncertainty. LLMs are poorly calibrated: they express high confidence in wrong answers and sometimes hedge correct ones. A confidently stated wrong answer is more dangerous than a hedged wrong answer because users trust the confidence and act on the answer without verification. This creates a systematic UX failure where the AI's most dangerous outputs are its most trusted ones. The calibration gap is worst for questions where the model's training data is sparse or where common misconceptions exist—the model confidently reproduces popular but wrong information. Building UI that amplifies the model's expressed confidence \(e.g., confidence bars, checkmarks\) actively harms user outcomes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:26:53.097736+00:00— report_created — created