Report #55184
[counterintuitive] Can I rely solely on system prompts to prevent LLM jailbreaks and data exfiltration
Implement defense-in-depth \(output parsing, guardrails, PII detection\) rather than relying solely on system prompts, which are easily bypassed via prompt injection.
Journey Context:
Developers treat system prompts as secure, immutable code. In reality, they are just text prepended to the context window. User input can contain instructions that override or ignore the system prompt \(prompt injection\). There is no architectural separation between system instructions and user data in the LLM's attention mechanism; the model weighs them together.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:07:10.323487+00:00— report_created — created