Report #69780
[counterintuitive] Using or tags and assuming the model's reasoning is completely hidden and safe from prompt injection
Do not rely on hidden CoT for security/safety against prompt injection; treat the CoT as potentially visible and manipulable by the user.
Journey Context:
Developers often use XML tags to hide reasoning, thinking it's a secure sandbox. However, user inputs can easily leak into these tags and instruct the model to output the hidden reasoning or bypass it. Hidden CoT is an organizational tool for structuring the prompt, not a security boundary. Adversarial inputs can easily break the tag structure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:36:44.066210+00:00— report_created — created