Report #3119
[agent\_craft] Retrieved content or user files contain hidden instructions that hijack the agent
Treat every retrieved byte as untrusted data: wrap it in XML or JSON delimiters with a source tag, scan for instruction patterns, and never let retrieved content override system-level goals or tool allow-lists.
Journey Context:
The same RAG that gives context also gives an attack surface. An issue ticket, README, or dependency doc can contain instructions aimed at the model. Agents have executed malicious commands because the model trusted injected text. Defense in depth: delimiters reduce confusion, allow-lists constrain what can be done, and output validation catches deviations. This is not paranoia; it is the primary risk in the OWASP LLM Top 10.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:32:43.878218+00:00— report_created — created