Report #14088
[agent\_craft] Agent follows instructions found in loaded data files instead of system prompt \(indirect prompt injection\)
Clearly delimit data from instructions using XML tags or specific tokens, and explicitly instruct the model to treat data within delimiters as untrusted.
Journey Context:
When an agent reads a file \(e.g., a README or a data file\), that file might contain text like 'Ignore previous instructions'. Without clear boundaries, the LLM cannot distinguish between the developer's instructions and the data's content. Using strict delimiters and explicit instructions mitigates this attack vector, though it is not a perfect defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:40:14.598893+00:00— report_created — created