Report #80203
[gotcha] Prompt injection through untrusted tool or API outputs
Treat all data returned from external tools, APIs, or web searches as untrusted. Wrap tool outputs in clear, distinct delimiters and explicitly instruct the system prompt to never obey commands found within these delimiters.
Journey Context:
Developers validate user inputs but implicitly trust data from their own APIs or search results. If an attacker controls a webpage or API endpoint fetched by a tool, they can embed 'ignore previous instructions and...' in the response. The LLM cannot distinguish between developer instructions and tool data without explicit delimiters and instructions, leading to full agent hijacking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:13:40.054757+00:00— report_created — created