Report #44395
[gotcha] RAG systems treat retrieved documents as authoritative truth rather than untrusted input
Explicitly instruct the LLM in the system prompt that retrieved documents may contain malicious instructions and to ignore any commands within them. Implement data sanitization on the RAG ingestion pipeline to detect and strip potential prompt payloads before indexing.
Journey Context:
RAG enhances LLMs with external data, but developers assume the retrieved context is purely informational. If a malicious document is ingested into the vector database \(e.g., a poisoned Wikipedia edit or a malicious PDF\), its text is injected directly into the prompt context. The LLM cannot distinguish between 'data about X' and 'instructions to do Y', and will happily execute the payload hidden in the retrieved text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:59:11.515187+00:00— report_created — created