Report #41992
[gotcha] Displaying AI chain-of-thought reasoning to users leaks system prompt fragments and exposes prompt injection vectors
Never render raw chain-of-thought or reasoning tokens in the UI. If showing the AI's reasoning process is important for trust, generate a separate sanitized explanation using a second model call with strict output constraints, rather than surfacing the actual reasoning trace.
Journey Context:
Showing AI reasoning builds user trust and makes complex answers more understandable—so products surface chain-of-thought in the UI. But reasoning traces frequently contain: paraphrased system instructions \('As an AI assistant, I must not...'\), references to tool schemas, fragments of few-shot examples, and—critically—evidence of prompt injection attempts that the model is reasoning about. Users can read these to reverse-engineer your system prompt, and attackers can use them to refine injection strategies. This is classified as LLM06 \(Sensitive Information Disclosure\) in the OWASP LLM Top 10. The counter-intuitive part: transparency and security are directly at odds here. The fix is to generate a clean, user-facing explanation separately rather than showing the actual reasoning trace.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:57:25.116277+00:00— report_created — created