Report #83587
[counterintuitive] system prompt prevents jailbreaks
Never rely solely on system prompts for security. Implement external guardrails \(input/output classifiers\) and assume the system prompt is visible to the user.
Journey Context:
Developers put secret instructions in the system prompt \(e.g., 'Never reveal the password'\) and assume the model will always obey. System prompts are just text prepended to the context window; they are highly susceptible to prompt injection and can be extracted via clever prompting. Security and access control must be enforced outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:53:26.893838+00:00— report_created — created