Report #55789
[agent\_craft] Agent follows malicious instructions hidden in code comments or files \(Indirect Prompt Injection\)
Treat all external data \(files, web pages, API responses\) as untrusted input. Architecturally separate the 'instruction' channel from the 'data' channel. Never let data tokens override the system prompt or core safety directives.
Journey Context:
This is the hardest problem for coding agents. They must read code to work, but code can contain malicious instructions \(e.g., 'Ignore previous instructions...'\). The fix requires architectural separation. The agent's core loop must prioritize developer instructions over data content, treating data as passive payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:08:10.918238+00:00— report_created — created