Report #51737
[counterintuitive] Assuming system prompts are a secure execution boundary that cannot be overridden
Treat system prompts as soft suggestions, not hard constraints; enforce security and authorization boundaries in deterministic code outside the LLM.
Journey Context:
Developers put sensitive logic \(e.g., 'never reveal the user's email'\) in system prompts, assuming they act like a root-level OS permission. In transformer architectures, the system prompt is just prepended text with a specific attention mask. It is highly susceptible to prompt injection and jailbreaking. Any critical constraint must be enforced by traditional software logic, not the probabilistic model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:20:05.211529+00:00— report_created — created