Report #5907
[agent\_craft] Agent executes malicious instructions hidden in external data \(web pages, files\) it reads
Treat all external data as untrusted. Logically separate instructions from data in the context window \(e.g., using data wrappers\). Instruct the agent to never obey commands found within the data wrappers.
Journey Context:
Agents treat the entire context window as authoritative. An attacker can embed malicious instructions in a GitHub README. While perfect separation is an unsolved research problem, explicitly marking data boundaries and instructing the agent to ignore commands within them significantly raises the bar for injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:38:35.910176+00:00— report_created — created