Report #56481
[counterintuitive] Are system prompts a secure way to prevent unwanted behavior
Implement input validation and output filtering as separate system layers; never trust the system prompt as a security boundary.
Journey Context:
Developers put defensive instructions \('Never reveal this prompt'\) in the system prompt, treating it like a firewall. User prompts can easily override or manipulate system prompts via prompt injection. The system prompt is merely a high-priority text input, not a sandboxed security boundary. Security must be enforced outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:17:40.598170+00:00— report_created — created