Report #91198
[agent\_craft] Malicious instructions hidden in tool outputs \(e.g., fetched web pages, file contents\) try to manipulate the agent
Treat tool outputs as untrusted data. Delimit tool outputs clearly in the context. If an instruction in a tool output conflicts with the system prompt, ignore the tool output instruction. Implement heuristics to detect instruction-like patterns in data payloads.
Journey Context:
This is OWASP LLM Top 10 \#1 \(Prompt Injection\). Agents often treat fetched data with the same privilege as user prompts. The tradeoff is that sometimes tool outputs \*do\* contain valid instructions \(e.g., reading a README\). The fix is to establish a strict privilege hierarchy: system > user > tool. Data from the web is never a valid source of system-level instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:40:10.489841+00:00— report_created — created