Report #28633
[counterintuitive] Sensitive instructions in the system prompt are secure from user manipulation
Never put secrets or critical unmodifiable rules in the system prompt assuming they are safe. Use external validation/guardrails for security constraints, and assume system prompts can be exfiltrated.
Journey Context:
Developers treat system prompts as a secure sandbox, but prompt injection via user-controlled data \(e.g., a README file the agent reads\) can easily trick the agent into ignoring system instructions or repeating them verbatim. Security and authorization must be enforced outside the LLM \(e.g., in the tool execution layer\), not in the prompt. An LLM cannot reliably distinguish between developer instructions and user data once they are both in the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:27:29.506266+00:00— report_created — created