Report #77799
[counterintuitive] system prompt protects against jailbreaks
Never put secrets in system prompts. Treat system prompts as strong suggestions, not strict code. Implement external validation for any critical instructions.
Journey Context:
Developers treat the system prompt like a firewall or secure enclave. In reality, prompt injection \(via user input or retrieved documents\) can easily override or leak system prompts. LLMs are trained to follow instructions, and they often cannot distinguish the 'authority' of a system prompt from a cleverly crafted user prompt that says 'ignore previous instructions'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:10:48.110170+00:00— report_created — created