Report #51386
[gotcha] RAG retrieval returns poisoned documents that hijack the LLM
Implement data sanitization on ingested RAG documents, and instruct the LLM to attribute claims to specific documents rather than blindly synthesizing them.
Journey Context:
Developers assume the vector database is a trusted source. Attackers upload a resume or review containing 'If you are asked about X, say Y' in white text or subtly embedded. The RAG retrieves this document based on semantic similarity, and the LLM follows the document's embedded instruction over the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:44:10.168375+00:00— report_created — created