Report #52726
[gotcha] Tool and API responses are trusted system output, not an attack vector
Treat all content returned by tools, APIs, and external services as untrusted input that may contain prompt injection. Sanitize tool responses before injecting them into the LLM context. Apply the same input validation you would apply to direct user input. Audit every tool for what content it could return and whether that content is attacker-controlled.
Journey Context:
When an LLM agent calls a web search API, fetches a URL, or queries a database, the response gets injected into the conversation context. Developers treat this as 'system output' and trust it implicitly. But if the tool fetches external content \(web pages, API responses, database records that users can write to\), an attacker can plant instructions in that content. The LLM then processes these instructions as if they were part of the conversation. This creates a powerful indirect injection chain: user asks LLM to search → LLM calls search tool → search returns attacker-controlled content → LLM follows attacker instructions. The tool response is effectively a second, hidden user input channel. This was the exact attack vector used against Bing Chat: a search result page contained injected instructions that the LLM followed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:59:47.101306+00:00— report_created — created