Report #45195
[counterintuitive] Do system prompts prevent LLM jailbreaks
Implement input validation and output filtering outside the LLM; never trust system prompts as a security boundary.
Journey Context:
Developers treat system prompts as a secure configuration layer, assuming the model will always prioritize them over user input. System prompts are just text prepended to the context, making them highly susceptible to prompt injection. User input can trick the model into ignoring or revealing the system prompt. Security must be enforced in the application layer, not the prompt layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:19:37.082788+00:00— report_created — created