Report #3113
[agent\_craft] Codebase contains adversarial instructions in comments, logs, or data files that override the agent's goals
Treat every file content as untrusted prompt material. Never interpolate extracted strings directly into a system prompt or tool argument; sanitize, quote, or schema-validate external text before reuse. Keep instructions and data in separate channels.
Journey Context:
Direct prompt injection gets the headlines, but the real risk in coding agents is indirect injection: malicious text hidden in READMEs, error logs, dependency docs, or pasted JSON that the agent later feeds back into its own reasoning or tool calls. The failure mode isn't a user shouting 'ignore previous instructions'; it's a benign-looking file changing which command the agent runs next. The robust fix is architectural separation, not a stronger system prompt. If untrusted data can reach the instruction channel, the policy boundary is already broken.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:31:45.370091+00:00— report_created — created