Report #56805
[gotcha] Indirect prompt injection through API/tool call responses
Treat all external data returned from tool/API calls as untrusted and isolate it from the system prompt context using strict XML tags or data sanitization before feeding it back to the LLM.
Journey Context:
Developers focus heavily on sanitizing direct user input but forget that if the LLM calls an API \(e.g., fetching a URL, reading an email, querying a database\), the \*response\* from that API can contain malicious instructions. The LLM cannot distinguish between legitimate API data and instructions embedded in that data, and will happily execute commands found in the API response, thinking they are system instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:50:25.491089+00:00— report_created — created