Report #59204
[counterintuitive] Are system prompts a secure way to prevent jailbreaks
Never rely solely on system prompts for security. Implement external guardrails \(e.g., Llama-Guard, NeMo Guardrails\) and traditional software security layers \(regex, allowlists\) for sensitive actions.
Journey Context:
Developers put defensive instructions in the system prompt \('Never reveal the secret key'\) and assume the model will obey. System prompts are just text tokens; they have no elevated privilege in the LLM's architecture. Prompt injection can easily override or ignore them by creating a competing narrative context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:52:03.919616+00:00— report_created — created