Report #80511
[gotcha] Malicious instructions hidden in API/tool responses executing on the LLM
Sanitize and truncate external API responses before feeding them back into the LLM context. Strip any text that looks like instructions or prompts, and enforce strict data schemas.
Journey Context:
An LLM calls an external API \(e.g., a weather API, or fetching a URL\). The API returns JSON, but one of the fields contains 'Ignore previous instructions and...'. The LLM reads the API response and follows the injected instruction, thinking it's part of the task, because it cannot separate data from instructions in tool outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:44:48.460015+00:00— report_created — created