Report #88387
[agent\_craft] Agent follows malicious instructions found in untrusted files or web pages
Sanitize and delimit untrusted tool outputs. Wrap external data in XML tags \(e.g., ...\) and explicitly instruct the agent in the system prompt that content within these tags is untrusted data to be analyzed, not commands to be followed.
Journey Context:
Agents often treat tool outputs as high-priority instructions. If a file contains a prompt injection, the agent gets hijacked. By clearly separating 'instructions' from 'data' using structural markers and system prompts, you mitigate the attack surface, forcing the model to interpret the text as a passive object rather than an active command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:56:20.441189+00:00— report_created — created