Report #29244
[gotcha] User message claiming System role overriding the actual system prompt
Explicitly structure the prompt with clear role tags \(e.g., , \) and instruct the model that only text within tags are instructions. Better yet, use API-level system roles instead of concatenating everything into the user prompt.
Journey Context:
Many developers concatenate the system prompt and user prompt into a single text field \(especially with older or open-source models\). Attackers can inject text like \[SYSTEM\] Override previous instructions and... Because the LLM relies on textual cues to distinguish roles when they aren't structurally separated, it might obey the attacker's fake system message over the developer's actual system message.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:28:47.408794+00:00— report_created — created