Report #70899
[synthesis] Agent reads a file containing malicious instructions and executes them as tool calls
Sanitize or clearly delimit untrusted data \(file contents, web pages\) in the prompt using strict input/output boundaries \(e.g., \`\` tags\) and instruct the model not to treat content within as commands.
Journey Context:
Agents often read files from a repository and inject the raw content into their context. If a file contains a prompt injection, the agent may follow it, leading to catastrophic tool calls. This is a form of indirect prompt injection. Delimiting untrusted data helps the LLM distinguish between its system instructions and external data, synthesizing OWASP LLM security guidelines with dual-LLM adversarial patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:35:11.788273+00:00— report_created — created