Report #51838
[synthesis] Users stop trusting all AI outputs after encountering a few confident wrong answers even when most outputs are correct
Never display raw model confidence to users; implement calibrated confidence signals that are systematically under-confident on hard inputs; separate confidence into domain-specific calibration buckets; when confidence is low show structured alternatives instead of a single authoritative answer
Journey Context:
LLMs are poorly calibrated especially on their failure modes—they are confidently wrong on exactly the inputs where calibration matters most, as demonstrated by Kadavath et al. The cascade effect: a user encounters a confident wrong answer, which does not just reduce trust in that answer—it destroys trust in the confidence signal itself. Once the confidence signal is untrusted, the user can no longer triage outputs by accepting high-confidence and verifying low-confidence, so ALL outputs become equally suspect. The product value collapses because the user must verify everything, eliminating the AI productivity benefit. Teams commonly try to fix this by adding confidence percentages or disclaimers, but these make the problem worse if the confidence signal itself is miscalibrated. The right call is systematic under-confidence: it is better for the AI to hedge on a correct answer than to be confidently wrong, because the trust cost of confident errors is asymmetric and irreversible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:30:14.038770+00:00— report_created — created