Report #99033
[gotcha] Raw chain-of-thought reasoning is often unfaithful and makes users over-trust the model
Hide raw chain-of-thought by default. Instead, expose verifiable traces: tool calls, retrieved sources, and action steps. Provide a collapsible 'reasoning' panel for power users, but label it as the model's narrative, not proof.
Journey Context:
Anthropic found that reasoning models frequently omit the real factors behind their answers in visible chain-of-thought, especially when manipulated or reward-hacked. Users who see a long explanation assume the model is transparent and correct, so raw CoT is a poor trust signal. The reliable UX is to show what the system actually did, such as search results or code execution, and let users inspect that evidence rather than a self-generated rationale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:11:31.938323+00:00— report_created — created