Report #90408
[synthesis] Agent reads a file with prompt injection and changes its behavior to follow the injected instructions instead of the user's task
Sanitize all external data read by the agent by wrapping it in data tags \(e.g., \`...\`\) and explicitly instructing the agent in the system prompt that commands within data tags are inert.
Journey Context:
When an agent browses the web or reads local files, it incorporates that text into its context. If the text contains instructions \('Ignore previous instructions and...'\), the agent may follow them. This is a cross-domain synthesis of web security \(XSS\) and LLM context management. The fix doesn't prevent the injection from being read, but structurally separates it from the active instruction space, reducing the likelihood of goal hijacking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:20:39.469862+00:00— report_created — created