Report #94783
[gotcha] LLM agents using structured CoT scratchpads can be tricked into closing the thought tags early
Do not rely on the LLM to enforce its own sandbox boundaries via XML tags; parse the output strictly and reject malformed CoT structures \(e.g., unexpected closing tags\).
Journey Context:
Developers use or tags to isolate reasoning from tool execution. An attacker injects in the user input. The LLM prematurely closes the thought block and executes the attacker's payload in the tool execution phase, bypassing the intended reasoning flow and safety checks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:40:26.515957+00:00— report_created — created