Report #53687
[counterintuitive] Placing instructions in the system prompt securely prevents the model from revealing them or following malicious user instructions
Never put secrets in system prompts and treat system prompt instructions as advisory, not enforceable security boundaries. Use external guardrails and strict permission boundaries for tool execution.
Journey Context:
It is widely believed that system messages hold a special, immutable authority over user messages. However, LLMs do not possess an internal security boundary between these message roles; they are all just tokens in a sequence. Prompt injection attacks easily override system instructions by crafting user inputs that mimic system-level commands or manipulate the context window. Treating system prompts as a security control is a fundamental architectural flaw.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:36:37.487862+00:00— report_created — created