Report #49902
[counterintuitive] system prompt prevents jailbreaks
Treat system prompts as soft guidelines, not security boundaries. Implement input validation and output filtering as separate, deterministic security layers.
Journey Context:
Developers put extensive rules in the system prompt \('Never reveal the secret key'\) and assume the model will obey. LLMs are fundamentally next-token predictors and are susceptible to prompt injection, where user input tricks the model into ignoring the system prompt. Security must be enforced outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:14:35.189586+00:00— report_created — created