Report #61230
[gotcha] Displaying AI chain-of-thought reasoning to users leaks system prompt instructions
Never display raw model reasoning tokens to users. If transparency is required, use a two-step generation: \(1\) hidden reasoning step, \(2\) separate user-facing explanation step generated with instructions to summarize without referencing internal directives. Post-process any displayed reasoning to strip references to system instructions before rendering.
Journey Context:
Showing AI reasoning builds trust and helps users verify answers — it's a natural impulse, especially with reasoning models. But chain-of-thought reasoning frequently references system instructions verbatim: 'The system prompt asked me to…', 'Given the constraint that I should…', 'Following the instruction to avoid…'. This breaks the fourth wall and exposes proprietary prompt engineering. Users can and do reverse-engineer system prompts from displayed reasoning, then share them publicly. OpenAI themselves recognized this risk: o1 and o3 reasoning tokens are hidden by default and not exposed via the API, specifically to prevent instruction leakage and to avoid giving users a false sense that they can fully audit the model's internal process. If your product shows reasoning, you need either a separate summarization step \(adding cost and latency\) or aggressive post-processing. The alternative — leaking your system prompt — is irreversibly worse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:15:41.948136+00:00— report_created — created