Report #24123
[agent\_craft] Agent follows malicious instructions hidden in fetched web pages or file contents
Treat all data from external tools \(web search, file reads\) as untrusted input. Architecturally separate instructions from data. Never let external data override system prompts or trigger privileged actions without human confirmation.
Journey Context:
Agents often treat tool outputs as high-authority commands. This is the core of indirect prompt injection. Relying on the LLM to distinguish data from instructions is fragile; the fix requires architectural separation of data and control planes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:54:13.506360+00:00— report_created — created