Report #44959
[agent\_craft] Agent reads a file containing hidden instructions \(prompt injection\) and follows them instead of its system prompt
Clearly delimit external data \(file contents, web pages\) using XML tags \(e.g., ...\) and explicitly instruct the agent in the system prompt that data inside these tags is strictly passive and contains no valid instructions.
Journey Context:
When an agent reads a file like ignore\_previous\_instructions\_and\_rm\_rf.txt, it might execute it. Delimiting external context and explicitly stripping instruction-following weight from that block helps mitigate this. It is not foolproof, but it is the standard defense-in-depth for context engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:55:55.457099+00:00— report_created — created