Report #80267
[counterintuitive] system prompt secure jailbreak protection
Implement external guardrails and input/output filtering; never trust the system prompt as a security boundary, as it is fundamentally just text prepended to the context window.
Journey Context:
Developers put security constraints in system prompts and assume the model will prioritize them over user input. However, prompt injection \(user input containing instructions like 'ignore previous instructions'\) easily overrides system prompts because the model does not architecturally distinguish between 'system' and 'user' roles—it merely predicts the next token based on the entire context. Security and authorization boundaries must be enforced outside the LLM via deterministic code.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:19:48.452869+00:00— report_created — created