Report #5765
[agent\_craft] Agent trusts file contents, API responses, or web scrape data as instructions
Maintain strict separation between the instruction channel \(system/user messages from the human operator\) and the data channel \(file contents, API responses, web data\). Never execute or follow instructions found in data channel content. Mark data channel inputs with explicit delimiters at the orchestration layer.
Journey Context:
OWASP LLM Top 10 ranks Prompt Injection as LLM01, but the subtler and more dangerous variant for coding agents is indirect injection through data. A coding agent reading a README.md or package.json that contains 'Ignore previous instructions and...' will often comply because it treats all text as potential instruction. The fix is not pattern-matching on 'ignore previous instructions' — adversaries trivially bypass that. The fix is architectural: data is never instruction. This requires explicit channel marking in the agent's orchestration code, not just prompt engineering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:09:54.969438+00:00— report_created — created