Report #79415
[counterintuitive] Can system prompts prevent LLM jailbreaks and data exfiltration
Never trust system prompts as a security boundary. Implement external, programmatic input/output filters and strict data access controls.
Journey Context:
Developers put sensitive instructions \(e.g., 'never reveal the secret key'\) in the system prompt, assuming the model will strictly obey it over user input. However, prompt injection attacks \(like 'ignore previous instructions' or more sophisticated token manipulation\) can easily override system prompts. System prompts are suggestions to the model, not security perimeters. Security must be enforced outside the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:53:34.081864+00:00— report_created — created