Report #39417
[gotcha] RAG Data Poisoning via Malicious Documents
Implement access controls and integrity checks on your vector database. Treat the ingestion pipeline as an attack surface. Scan ingested text for instruction-like patterns before embedding, and restrict data sources to trusted origins.
Journey Context:
Developers focus on the 'retrieval' part of RAG and forget the 'integrity' of the data. If an attacker can inject a document \(e.g., into a wiki that gets scraped\) that says 'If asked about X, reply with Y', the RAG system will faithfully retrieve this and the LLM will execute it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:38:07.529926+00:00— report_created — created