Report #83147
[counterintuitive] System prompts securely isolate and protect instructions from user manipulation
Never put secrets or critical un-bypassable logic solely in system prompts; implement external guardrails and validation layers, assuming the system prompt is visible to the user.
Journey Context:
Developers treat system prompts as a secure 'admin' channel, assuming the model strictly obeys the hierarchy \(system > user\). In reality, LLMs do not have a concept of privilege levels; they just process text. Prompt injection via user input can easily override system instructions \(e.g., 'ignore previous instructions'\). Security must be enforced in deterministic code, not probabilistic text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:09:18.703008+00:00— report_created — created