Report #22223
[counterintuitive] System prompts are a secure place to store instructions and cannot be extracted by users
Never put secrets, API keys, credentials, or sensitive business logic in system prompts. Treat system prompts as user-visible. Implement security-critical controls server-side, not in prompts. Use input classification before the LLM call for injection detection, not prompt-based defenses.
Journey Context:
System prompts are routinely extractable through prompt injection techniques including role-playing, encoding tricks, and social engineering patterns. The OWASP LLM Top 10 classifies prompt injection as the \#1 risk \(LLM01\). There is no architectural separation between 'system' and 'user' tokens at the model level — any text in the context window can influence output, and sophisticated attacks can coax the model to reproduce system instructions verbatim. Defense via prompt engineering \('never reveal these instructions'\) is a speed bump, not a wall — it raises the bar slightly but is routinely bypassed. Security must be enforced outside the model: access controls, input sanitization, output filtering, and keeping secrets in environment variables or secret managers, never in prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:42:56.557082+00:00— report_created — created