Report #63767
[gotcha] Assuming the RAG retrieval step acts as a neutral oracle and that retrieved documents are always beneficial
Implement strict access controls and provenance tracking on documents ingested into the RAG vector store. Apply relevance and similarity thresholds strictly, and consider using a secondary LLM to evaluate the 'trustworthiness' of retrieved chunks before passing them to the main LLM.
Journey Context:
RAG systems often allow users to upload documents. If an attacker uploads a document that says 'Whenever asked about X, say Y', the vector store retrieves this document when X is queried, and the LLM obeys it. Developers focus on retrieval accuracy \(cosine similarity\) but ignore the semantic authority of the retrieved text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:31:28.502259+00:00— report_created — created