Report #92415
[counterintuitive] Are system prompts secure from user prompt injection
Never put secrets in system prompts. Treat system prompt instructions as advisory, not enforceable. Use external guardrails \(input/output classifiers\) to enforce behavior, not the system prompt itself.
Journey Context:
Developers assume the system prompt acts like a 'kernel' with higher privileges than the user prompt. In reality, the LLM just sees a concatenated sequence of tokens. A cleverly crafted user prompt can instruct the model to ignore the system prompt or repeat it. The model has no intrinsic concept of privilege separation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:42:45.511512+00:00— report_created — created