Report #26608
[gotcha] Tool or API output containing prompt injection overrides system instructions
Treat all data returned from external tools, APIs, or web searches as untrusted. Isolate tool output from the system prompt and user prompt using distinct chat roles \(e.g., \`tool\`\), and explicitly instruct the model in the system prompt to treat tool output as data, not instructions.
Journey Context:
Developers assume that if they control the tool calls, the output is safe. But if a tool fetches a web page or reads an email, an attacker can place 'ignore previous instructions' in that content. Because the LLM cannot strictly separate data from instructions, tool output can hijack the agent's trajectory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:03:48.182404+00:00— report_created — created