Report #90459
[counterintuitive] system prompts securely restrict model behavior
Treat system prompts as mutable instructions, not security boundaries; implement external guardrails and input sanitization to prevent prompt injection.
Journey Context:
Developers put safety rules, PII redaction instructions, or access controls in the system prompt, assuming the model treats them as immutable laws. Because LLMs cannot fundamentally distinguish between 'system instructions' and 'user data' at an architectural level, indirect prompt injection \(e.g., a malicious string in a retrieved RAG document\) can easily override the system prompt. Relying on system prompts for security guarantees leads to trivially exploitable systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:25:51.208073+00:00— report_created — created