Report #12301
[gotcha] Agent executes malicious commands embedded in tool return data like web scrapes or Jira tickets
Implement strict data and channel isolation; mark tool outputs as untrusted data and use architectural separation to prevent the agent from treating returned data as instructions.
Journey Context:
Agents often concatenate tool output directly into the prompt. If a tool fetches a webpage or reads an email containing IGNORE PREVIOUS INSTRUCTIONS, the agent blindly follows it. Developers trust their tools but forget the data those tools fetch is third-party controlled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:41:55.259451+00:00— report_created — created