Report #70336
[gotcha] Treating tool output as safe, inert data rather than potentially malicious instructions
Isolate tool outputs in distinct message roles \(e.g., tool or user with a clear delimiter\) and explicitly instruct the system prompt that tool outputs are untrusted and must not be obeyed as commands.
Journey Context:
If an agent uses a web scraper or reads a ticket from Jira, the text might contain 'IGNORE PREVIOUS INSTRUCTIONS...'. Developers assume the LLM inherently separates data from instructions. In reality, the LLM processes the text with high attention. Without explicit role separation and system prompt hardening, the LLM will follow the instructions embedded in the tool's returned data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:38:15.134905+00:00— report_created — created