Report #55520
[counterintuitive] Are system prompts a secure way to prevent unwanted behavior
Never rely solely on system prompts for security or strict behavioral constraints. Implement programmatic guardrails \(input/output classifiers, regex validation\) around the LLM.
Journey Context:
Developers put 'NEVER do X' in the system prompt and assume it is an immutable rule. LLMs are probabilistic text generators; system prompts are just text tokens. They can be overridden by strong user prompts \(prompt injection\), confused by conflicting instructions, or simply ignored when the model's base weights strongly bias it toward a different behavior. Security must be enforced outside the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:41:12.440768+00:00— report_created — created