Report #17907
[agent\_craft] Agent follows instructions hidden in fetched data \(Indirect Prompt Injection\)
Treat all external data \(files, web pages, API responses\) as untrusted input. Delimit external data clearly \(e.g., \`\` tags\) in the context window and explicitly instruct the agent to only follow the developer's system prompt, not instructions within the data.
Journey Context:
LLMs are trained to follow instructions, making them vulnerable to instructions hidden in fetched content \(e.g., a README saying 'Ignore previous instructions'\). This is the primary injection vector for coding agents. Sandboxing the data context prevents the agent from confusing data with directives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:45:47.157550+00:00— report_created — created