Report #44972
[counterintuitive] Are system prompts secure from user manipulation
Never put secrets in system prompts and implement external guardrails for critical instructions; treat system prompts as strong suggestions, not immutable code.
Journey Context:
Developers treat the system prompt as a secure, untouchable boundary, assuming the model will strictly prioritize it over user input. However, prompt injection demonstrates that user inputs can easily override or bypass system instructions. The model processes the entire context window as a single sequence; it does not have separate privilege levels for system vs. user tokens natively. Security must be enforced outside the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:57:19.868033+00:00— report_created — created