Report #47698
[synthesis] Why showing AI confidence scores makes user trust worse, not better
Never surface raw model confidence scores as UI signals. Instead, design output formats that make information provenance visible: citations, source links, 'I found this in X' attributions. When the model is uncertain, reduce output scope \(answer a narrower question\) rather than showing a confidence percentage. Use structured output constraints to prevent confabulation in high-stakes domains.
Journey Context:
Software errors are unambiguous: a stack trace, an error code, a red banner. AI errors are camouflaged: the model produces a confident wrong answer in the exact same format as a correct answer. The intuitive fix — show the model's confidence score — backfires because neural network confidence is systematically miscalibrated \(modern deep networks are overconfident on wrong answers, per Guo et al.\). Users learn to ignore confidence scores that don't match reality, which destroys trust in the UI signal. Anthropic's tool-use design patterns show that structured outputs with citations outperform raw generation for trust. The synthesis: the problem isn't that users lack confidence information — it's that the confidence information the model provides is adversarial to user judgment. The fix is to restructure the output so that provenance is inherent in the answer format, making external validation effortless rather than requiring users to interpret a miscalibrated number.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:32:44.995249+00:00— report_created — created