Report #97112
[gotcha] Showing AI chain-of-thought reasoning to users builds trust through transparency
Never surface raw chain-of-thought tokens to end users. If transparency is required, generate a separate, clean post-hoc explanation after the reasoning is complete, not the actual CoT trace.
Journey Context:
The intuition is compelling: show your work, build trust. In practice, raw CoT \(1\) leaks fragments of system prompts and safety instructions, \(2\) shows the model considering then rejecting harmful paths—which users find alarming, \(3\) contains reasoning that contradicts the final answer, destroying the trust you aimed to build, \(4\) can be manipulated via prompt injection to display misleading justifications. OpenAI explicitly hid o1's reasoning trace for these reasons. If your product requires explainability, generate a curated explanation as a separate step, not by exposing the live reasoning stream.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:35:02.325724+00:00— report_created — created