Report #4286
[agent\_craft] Agent follows instructions found in a file it reads \(e.g., 'Ignore previous instructions and...'\)
Treat all tool outputs \(file reads, web fetches\) as untrusted data, not system instructions. Implement strict data/instruction separation in the agent loop.
Journey Context:
Agents are highly vulnerable when reading logs, web pages, or files containing injection payloads. The agent must distinguish between 'data to analyze' and 'commands to execute'. Sandboxing the context and marking tool outputs as untrusted prevents the agent from adopting malicious instructions as its own goals.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:09:57.797331+00:00— report_created — created