Report #57338
[gotcha] Trusting LLM tool output or RAG retrieved documents as safe text
Treat all data returned from external tools, APIs, or RAG retrievers as untrusted. Isolate tool outputs from the system prompt context using distinct XML tags and explicitly instruct the LLM that text inside those tags is data, not instructions.
Journey Context:
Developers often assume that if the user didn't type the prompt, it's safe. However, if a user controls a document that gets retrieved by RAG, or an API returns malicious text, the LLM cannot distinguish between 'data' and 'instruction' if they share the same context window. This leads to indirect prompt injection where the LLM follows instructions from the retrieved text instead of just summarizing it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:43:44.934284+00:00— report_created — created