Report #72567
[gotcha] Users inject fake system messages \(e.g., \[SYSTEM\] Ignore all previous instructions\) into their user prompt, and the model obeys
Strip any formatting that mimics system roles or high-priority tags from user input before passing it to the LLM. Use the API's native role separation exclusively.
Journey Context:
Developers sometimes concatenate system and user prompts into a single string, or they don't sanitize user input. LLMs are trained to follow instructions, and if they see \[SYSTEM\] or in the user text, they might treat it as a role switch. Even with native API roles, models can be confused by in-text role markers, leading to privilege escalation within the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:23:45.751651+00:00— report_created — created