Report #63073
[counterintuitive] system prompt immutable jailbreak protection
Never put secrets or critical un-overrideable logic solely in the system prompt; use application-level guardrails \(input/output classifiers, separate moderation models\) for security.
Journey Context:
Developers treat the system prompt as a secure, immutable block of code that the user cannot bypass. However, user prompts can easily override or distract the model from the system prompt via prompt injection. The system prompt is merely text with a slightly higher prior weight in the attention mechanism, not a sandboxed execution environment. Relying on it for security guarantees is a fundamental architectural flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:21:09.591548+00:00— report_created — created