Report #97505
[counterintuitive] A strong system prompt is enough to prevent prompt injection and keep instructions secret
Assume the system prompt and any user content can be overridden or leaked. Enforce security in deterministic control planes: input sanitization, allowlists of tools/actions, output filters, privilege separation, and audit logs. Never put secrets in prompts.
Journey Context:
OWASP lists prompt injection as the \#1 risk for LLM applications. Studies consistently find that aligned models can still be jailbroken and that 'ignore previous instructions' style attacks succeed against production systems. A system prompt is a suggestion, not a kernel boundary. The reliable pattern is defense in depth: separate untrusted content with delimiters \(spotlighting\), run a safety classifier, and use a control plane that can reject or sandbox agent actions regardless of what the LLM says.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:14:04.143244+00:00— report_created — created