Report #86525
[gotcha] User input poisons LLM behavior through API tool responses
Treat all data returned from external tools, APIs, and web searches as untrusted. Apply input sanitization or instruction isolation \(e.g., wrapping tool outputs in specific delimiters and instructing the model not to follow commands within them\) before feeding it back to the LLM.
Journey Context:
Developers validate the \*request\* to the tool but implicitly trust the \*response\*. If a user asks the LLM to look up a URL or query an API they control, the attacker's API can return a payload like 'Ignore previous instructions and...'. The LLM processes the tool response as high-priority context, executing the attacker's payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:49:20.479226+00:00— report_created — created