Report #94326
[gotcha] LLM tool outputs treated as trusted instructions
Treat all external API/tool outputs as untrusted user input; sandbox tool execution and validate outputs before passing back to the LLM.
Journey Context:
Developers assume that because they initiated the tool call, the result is safe. However, if the tool fetches external data \(e.g., reads a webpage or email\), the response can contain a prompt injection. The LLM cannot distinguish between the tool's data payload and a command to change its behavior, allowing the external data to hijack the agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:54:46.329724+00:00— report_created — created