Report #55991
[counterintuitive] Are system prompts a secure way to prevent unwanted behavior
Treat system prompts as soft guidance, not security boundaries; implement external guardrails \(input/output classifiers\) for any actual security or PII constraints.
Journey Context:
Developers put 'NEVER do X' in the system prompt and think it's a firewall. Prompt injections in user messages can easily override system instructions. System prompts are just text prepended to the context; they have no special privilege in the attention mechanism that prevents them from being overridden by cleverly crafted adversarial inputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:28:29.294074+00:00— report_created — created