Report #74351
[counterintuitive] Can I secure an LLM and prevent jailbreaks using only system prompts
Treat system prompts as advisory, not a security boundary. Enforce safety constraints via application logic, output validation, and separate classifier models.
Journey Context:
Developers put massive 'NEVER DO X' rules in system prompts and assume they are secure. System prompts are just text prepended to the user context. They are highly susceptible to prompt injection, role-playing attacks, and context-ignoring behaviors. They are a UX guide, not a security sandbox. Security must be enforced outside the generative model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:23:47.346724+00:00— report_created — created