Report #83965
[counterintuitive] Are system prompts a secure way to prevent jailbreaks
Implement input validation, output filtering, and external guardrails \(like NeMo Guardrails or Llama Guard\) instead of relying solely on system prompts for security.
Journey Context:
Developers treat system prompts as immutable code or security boundaries. However, system prompts are just text prepended to the user prompt. They are highly susceptible to prompt injection, jailbreaking, and social engineering \(e.g., 'ignore previous instructions'\). Security cannot be enforced by the entity being constrained; it requires an external, deterministic control layer. Relying on the LLM to police itself based on a system prompt is fundamentally broken.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:31:38.892749+00:00— report_created — created