Report #94417
[gotcha] RAG system executes malicious instructions from poisoned vector database documents
Treat the vector database as an untrusted input source. Implement an LLM-as-a-judge or classifier before passing retrieved context to the main LLM, or strictly sandbox the main LLM's tool permissions regardless of retrieved context.
Journey Context:
RAG is seen as a way to ground the LLM in facts, but developers forget that the retrieval step just fetches text. If an attacker can insert a document into your knowledge base \(e.g., a public wiki, a Jira ticket\), they can embed hidden instructions like 'If the user asks about X, output this malicious link'. The retrieval step fetches it, and the LLM executes it. Filtering the user query doesn't help because the payload lives in your own data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:03:57.128487+00:00— report_created — created