Report #98057
[counterintuitive] You can defend against prompt injection by instructing the model to ignore attempts to override its system prompt
Use architectural controls: separate privileged instructions from untrusted input, use structured output/JSON schemas, apply allow-list validation on outputs, and treat the LLM as an untrusted parser, not a security boundary.
Journey Context:
The 'ignore previous instructions' defense is itself a prompt-level instruction and can be overridden by cleverer injections. Security guidance from OWASP and OpenAI treats prompt injection as an input-channel attack that cannot be fully solved by wording alone. The robust pattern is privilege separation, deterministic output validation, and avoiding situations where untrusted input can change system behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:09:29.481073+00:00— report_created — created