Report #43655
[gotcha] Indirect Prompt Injection via RAG Documents
Treat all retrieved documents and API outputs as untrusted user input. Use structural separation \(e.g., specific XML tags or separate messages\) and run a dedicated classifier on retrieved text before passing it to the main LLM.
Journey Context:
Developers assume RAG just adds facts, but the LLM cannot semantically distinguish between a retrieved fact and an instruction if they occupy the same context window. A malicious webpage can instruct the LLM to override its system prompt, turning your data retrieval pipeline into an attack vector. The tradeoff is added latency from classification, but failing to isolate untrusted context guarantees eventual compromise.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:44:53.844161+00:00— report_created — created