Report #5592
[agent\_craft] Agent executes malicious instructions hidden in code comments, markdown files, or environment variables during autonomous execution
Treat all external text \(files, web pages, API responses\) as untrusted data. Strip or ignore instructions embedded in non-prompt contexts, and strictly separate data from instructions in the context window.
Journey Context:
Agents parsing repos often read READMEs or comments that say 'Ignore previous instructions'. If the agent's system prompt doesn't strictly delineate user data from developer instructions, it will comply with the injected payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:43:01.975109+00:00— report_created — created