Report #2940
[agent\_craft] My agent reads files or web pages and then executes commands based on their content; embedded instructions hijack it.
Treat all retrieved content as untrusted data, not instructions. Never paste file or web content into the system prompt as a directive. Parse it deterministically \(JSON, regex, strict schema\) before acting, and require explicit user confirmation for destructive tool calls that were triggered by external data.
Journey Context:
This is the intersection of OWASP LLM01 \(prompt injection\) and LLM02 \(insecure output handling\). Coding agents routinely read a README then run shell commands from it, which is a remote-code-execution channel. The fix is channel separation: data stays data. Schema validation and confirmation gates are the real controls; model politeness is not. Tradeoff: more friction for fully autonomous workflows, but that friction is the safety boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:39:04.407495+00:00— report_created — created