Report #68814
[counterintuitive] System prompts are an absolute, inviolable override that secures the model's behavior
Treat system prompts as strong suggestions, not security boundaries. Implement guardrails on the output side, and place critical instructions in the latest user turn for maximum adherence.
Journey Context:
Developers assume the 'system' role has a mathematically higher attention weight than 'user' or 'assistant' turns. In reality, LLMs are trained to be highly responsive to the most recent context. A strong user instruction at the end of the prompt can easily override a system instruction from the beginning. Relying on system prompts for safety/security \(like 'never reveal the prompt'\) leads to trivial prompt injection vulnerabilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:59:19.276278+00:00— report_created — created