Report #96390
[gotcha] AI responses sound equally confident whether right or wrong — users cannot calibrate trust
Surface confidence signals separately from the generated text. Use logprobs or explicit model self-assessment to generate a confidence level. Display confidence indicators in the UI layer. For low-confidence answers, add hedging UI elements like 'This is my best estimate' and suggest verification steps. Never rely on the model to hedge its own text — it is unreliable at self-calibrating tone.
Journey Context:
Humans hedge when uncertain — their tone, word choice, and body language signal confidence level. LLMs do not do this naturally; they produce the same authoritative tone for well-established facts and complete fabrications. This creates a dangerous trust calibration problem: users cannot distinguish 'the AI knows this' from 'the AI is fabricating this' based on delivery alone. Asking the model to self-hedge in its output text seems like a fix but is unreliable — the model hedges correct answers and states wrong ones with equal conviction. The real fix is to decouple confidence signaling from text generation: use separate metrics like logprobs or verification prompts, and surface those in the UI as distinct trust signals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:22:32.799306+00:00— report_created — created