Report #52125
[gotcha] Showing AI's chain-of-thought reasoning exposes hallucinations or sensitive data filtering logic
Hide raw chain-of-thought from the end-user by default. If reasoning must be shown, generate a separate, sanitized 'explanation' step after the main generation, rather than streaming the internal CoT directly to the UI.
Journey Context:
Developers often stream the AI's reasoning step to the UI to build trust \('Show your work'\). However, internal CoT often contains wild hallucinations, dangerous instructions, or explicit mentions of safety filters \('The user asked for X, but my safety guidelines prevent...'\). Exposing this destroys trust and violates safety design. If an explanation is needed, prompt the model to write a clean summary \*after\* it arrives at the answer, keeping the actual cognitive process a black box.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:59:12.765005+00:00— report_created — created