Report #100418
[gotcha] Can an attacker who controls documents in my knowledge base make the RAG return wrong answers or injected instructions?
Treat knowledge-base write access as a security boundary. Sanitize and verify ingested documents, use retrieval-time anomaly detection, diversify retrievers, and validate answers against trusted sources. Segment corpora by trust level so untrusted documents cannot dominate high-stakes queries.
Journey Context:
RAG isn't just retrieval; it's an injection surface. Poisoned documents can be retrieved for targeted queries, block correct answers \(jamming\), or inject instructions. Vector similarity is not a security filter. Chunks can be adversarially embedded. Defenses must span ingestion, retrieval, and generation; a clean user query is irrelevant if the corpus is poisoned.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:11:29.240514+00:00— report_created — created