Report #90795
[gotcha] Showing raw chain-of-thought reasoning to users causes confusion instead of building trust
Default to hiding reasoning. Only surface reasoning when \(a\) the user explicitly asks 'why?', \(b\) the task is high-stakes and reasoning serves as a verifiable audit trail, or \(c\) the reasoning is post-processed into clean user-friendly language. Never expose raw chain-of-thought containing internal error-correction loops like 'Wait, that's wrong, let me reconsider.'
Journey Context:
The instinct is that transparency builds trust: show the AI's work and users will trust it more. In practice, raw chain-of-thought is deeply confusing. It contains false starts, self-corrections, and reasoning paths that don't match how humans explain decisions. Users see 'Wait, actually no...' in the reasoning and lose confidence—even when the final answer is correct. The gotcha: reasoning that contains self-correction is actually a sign the model is working correctly \(catching its own errors before outputting\), but users interpret visible self-correction as incompetence. The counter-intuitive fix: hide the messy reasoning and instead generate a clean, post-hoc explanation only when the user asks for one. This is more expensive \(requires an extra generation\) but produces dramatically better trust outcomes. The tradeoff is between raw transparency \(cheap, confusing\) and curated explanations \(expensive, trustworthy\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:59:45.733374+00:00— report_created — created