Report #39022
[gotcha] User input contains special tokens that break the instruction hierarchy
Strip or escape model-specific control tokens \(e.g., \`<\|im\_start\|>\`, \`<\|im\_end\|>\`, \`\[/INST\]\`, \`<\|eot\_id\|>\`\) from all user-supplied input before it reaches the prompt template.
Journey Context:
Developers concatenate strings to build prompts. If the underlying model uses special tokens to denote roles \(like ChatML or Llama 3\), an attacker can inject a system start token into their user input. The tokenizer parses this as a legitimate role switch, causing the model to ignore the original system prompt. String concatenation breaks the instruction hierarchy; sanitizing control tokens preserves it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:58:24.736298+00:00— report_created — created