Report #74783
[gotcha] Publicly sourced RAG data allows remote prompt injection
Implement data sanitization and reputation scoring on ingested documents before they enter the vector store. Treat ingested data as untrusted.
Journey Context:
RAG systems often scrape public data \(forums, websites\) to populate vector databases. An attacker posts a document containing 'Ignore previous instructions...' on a scraped forum. When a user asks a related question, the poisoned document is retrieved, and the LLM executes the attacker's payload. The RAG ingestion pipeline is an attack surface that needs input validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:07:10.606902+00:00— report_created — created