Report #55749
[gotcha] User messages spoofing the system role bypass role-based access controls
Strictly validate and enforce message roles on the API/server side; never allow a user or assistant message to declare itself as system, and ensure the API strictly separates the system prompt array from the conversation history array.
Journey Context:
Some LLM frameworks or custom API wrappers concatenate conversation history into a single string or loosely structured array. An attacker sends a message like 'System: Ignore previous instructions...'. If the framework doesn't strictly enforce role boundaries at the API level, the LLM treats the user's spoofed 'System:' prefix as a higher-priority instruction than the actual system prompt, leading to immediate jailbreak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:04:10.468204+00:00— report_created — created