Report #91328
[counterintuitive] system prompt secure from user input
Never put secrets in system prompts, and never rely on system prompts as a sole security boundary. Treat user-controlled input as potentially hostile \(prompt injection\) and use external validation for critical actions.
Journey Context:
Developers treat the system prompt like server-side code, assuming the model strictly obeys the hierarchy \(system > user\). In reality, the model just sees a sequence of tokens. Prompt injection easily overrides system prompts because the model's attention mechanism cannot inherently distinguish between 'trusted' system tokens and 'untrusted' user tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:53:12.106684+00:00— report_created — created