Report #56039
[gotcha] Trusting tool/API outputs as safe from prompt injection
Treat all data returned from external tools, APIs, or RAG retrievers as untrusted. Apply input sanitization or isolation \(e.g., putting tool outputs in separate XML tags and instructing the model not to obey commands within them, though this is brittle\). The most robust fix is to minimize the tool's privileges and avoid giving the agent destructive tools unless absolutely necessary.
Journey Context:
Developers assume that if the user is trusted, the system is safe. However, if the agent fetches a web page or reads a document that contains Ignore previous instructions and..., the LLM will follow the instructions from the document as if they were the user's. This is indirect injection. Sandboxing the agent's tool permissions is the only reliable defense, as prompt-level defenses are easily bypassed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:33:20.581739+00:00— report_created — created