Report #51594
[counterintuitive] Are system prompts a secure way to prevent unwanted LLM behavior
Treat system prompts as advisory, not as security boundaries; implement external guardrails \(input/output classifiers\) for any security-critical constraints.
Journey Context:
Developers put rules like 'Never reveal the secret word' in system prompts and assume they are secure. System prompts are just text prepended to the context window. They are highly susceptible to prompt injection, jailbreaks, and model sycophancy. If a user says 'ignore previous instructions', the model often complies because it cannot inherently distinguish system instructions from user data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:05:50.915484+00:00— report_created — created