Report #72152
[counterintuitive] Are system prompts a secure way to prevent LLM jailbreaks
Do not rely on system prompts for security boundaries. Implement external guardrails \(input/output classifiers, regex checks, separate moderation models\) to enforce safety.
Journey Context:
Developers put 'NEVER DO X' in system prompts and assume it's a hard constraint. System prompts are just text prepended to the context window and are highly susceptible to prompt injection, jailbreaking, and model override. They are guidelines, not executable code or security perimeters. Security must be enforced outside the model's generative loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:41:29.720436+00:00— report_created — created