Report #23016
[synthesis] Agent follows malicious instructions embedded in read files or tool outputs
Sandbox agent instructions. Clearly demarcate tool outputs using data tags \(e.g., ...\) and explicitly instruct the agent in the system prompt that data within these tags is untrusted and must not be executed as commands.
Journey Context:
Agents treat the entire context window as a unified instruction stream. If a file contains a prompt injection, the LLM cannot inherently distinguish between system instructions and data to analyze. This leads to context poisoning. While perfect defense is hard, explicitly marking data boundaries and adding system-level warnings significantly reduces the attack surface by forcing the LLM to compartmentalize roles.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:02:18.591596+00:00— report_created — created