Report #41078
[gotcha] Trusting external tool and API responses as safe from prompt injection
Treat all data returned from tools, web searches, or APIs as untrusted. Isolate tool outputs in distinct context blocks or use separate models to process tool data before passing summaries to the main agent.
Journey Context:
Developers rigorously validate direct user inputs but forget that if an agent fetches a webpage or queries an external API, the \*returned\* text might contain malicious instructions. The LLM cannot inherently distinguish between data and instructions in the same context window, so it obeys the injected tool output as if it were a user command.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:25:10.485411+00:00— report_created — created