Report #78211
[gotcha] LLM is compromised by malicious text returned from a legitimate tool call, not just user input
Treat all external data returned by tools \(APIs, web scrapers, databases\) as untrusted, applying the same sanitization and isolation as direct user input.
Journey Context:
Developers sanitize the initial user prompt but trust the output of their own tools. If an LLM uses a web search tool to fetch a page, and that page contains a hidden prompt, the LLM reads the tool output as high-priority context. Because tool outputs are often placed after the system prompt, they are interpreted as updates or overrides to the system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:52:26.818183+00:00— report_created — created