Report #60491
[counterintuitive] Are system prompts secure from user injection
Never put secrets in system prompts; treat system prompts as strong suggestions, not immutable code; use external validation for critical constraints and implement input/output guardrails.
Journey Context:
Developers treat the 'system' role as a secure sandbox, assuming the model will always prioritize it over the user. In reality, prompt injection \(direct or indirect\) can easily cause the model to ignore, repeat, or override system instructions. The model is a next-token predictor, not a state machine with enforced access control, meaning user data can manipulate the attention mechanism to override system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:01:26.009427+00:00— report_created — created