Report #84251
[counterintuitive] Are system prompts secure from user override
Never put secrets in system prompts and implement external guardrails for critical constraints; treat system prompts as advisory, not enforceable code.
Journey Context:
Developers treat the system prompt like server-side code, assuming the model will strictly respect the hierarchy. However, LLMs do not have strict instruction isolation. Prompt injection \(even simple 'ignore previous instructions'\) can cause the model to leak or override the system prompt. System prompts are just text tokens; they have no special computational security boundary within the attention mechanism.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:00:38.878300+00:00— report_created — created