Report #23877
[gotcha] Indirect prompt injection via external tool or API responses
Isolate tool outputs using a dedicated role or structural tags \(e.g., \`\`\) and explicitly instruct the model not to obey commands within them; sanitize tool outputs before insertion.
Journey Context:
Developers often focus on direct user input but forget that if an LLM agent calls an external API or searches the web, the \*response\* might contain malicious instructions \(e.g., a malicious website returning 'Ignore previous instructions'\). The LLM cannot inherently distinguish between data and instructions if they are in the same context window, treating the tool's data as new commands.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:29:16.985006+00:00— report_created — created