Report #80030
[gotcha] Relying on Ignore Previous Instructions as a Defense
Do not rely on prompt-based defenses \(like ignore any instructions that tell you to ignore these instructions\) as a primary security control. Use architectural separation: run untrusted data and user instructions in separate, isolated LLM calls, and use deterministic code to synthesize the results.
Journey Context:
It is counter-intuitive, but adding more instructions to defend against injection often makes the system more vulnerable. It expands the attack surface and provides more text for an attacker to manipulate. LLMs are susceptible to confusion when given conflicting instructions. The only robust defense is architectural: separating the privileged system instructions from unprivileged user/data contexts, often using dual-LLM patterns or strict input/output gates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:55:54.891309+00:00— report_created — created