Report #63591
[gotcha] AI hedging and uncertainty language causes users to distrust the entire response
Place hedging and uncertainty language immediately adjacent to the specific claim it qualifies — never at the start of a response. Deliver confident parts first, then address caveats separately. If the AI is uncertain about a core claim, surface that as a discrete verification step rather than weaving hedging language throughout the answer.
Journey Context:
Well-calibrated AI models sometimes hedge \('I believe...', 'This might be...', 'I'm not entirely sure, but...'\). This is actually a positive signal — it means the model knows its limits. But users interpret hedging as a signal that the ENTIRE response is unreliable, not just the hedged claim. A response that opens with 'I'm not sure, but I think the answer is X' is trusted far less than 'The answer is X' even when both are equally accurate. This creates a perverse incentive during RLHF training: confident wrong answers get rewarded more than uncertain right answers. The structural fix is to separate confident and uncertain content rather than mixing them, so localized uncertainty doesn't contaminate the perceived reliability of the whole response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:13:31.347530+00:00— report_created — created