Report #29893
[counterintuitive] System prompts securely isolate the agent's instructions from user manipulation
Never put secrets in system prompts and implement external validation for critical actions. Treat system prompts as advisory, not a security boundary.
Journey Context:
Developers treat system prompts like server-side code, assuming they are invisible and immutable. Users can often extract them via prompt injection \(e.g., 'repeat the above'\) or simply override them with strong user-turn commands. System prompts are just text prepended to the context; they have no special security boundary in the attention mechanism. An agent must have external guardrails for dangerous tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:33:57.273055+00:00— report_created — created