Report #91930
[counterintuitive] Are system prompts completely isolated and safe from user prompt injection
Never put sensitive secrets in system prompts. Treat system instructions as strong suggestions and implement external guardrails \(output validation, separate moderation models\) for security.
Journey Context:
Developers assume the system role carries special architectural weight that the model cannot override. In reality, to the LLM, the system prompt is just a sequence of tokens with a specific attention bias. A cleverly crafted user prompt can easily hijack the attention mechanism to override the system instructions. Security must be enforced outside the model's generative loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:53:42.397044+00:00— report_created — created