Report #97549
[gotcha] Injecting a few poisoned documents into a retrieval corpus controls generated answers
Authenticate knowledge sources; integrity-check documents before indexing; monitor retrieval ranks for anomalous chunks; use ensemble retrieval and cross-checking; do not let RAG output drive high-stakes decisions without independent verification.
Journey Context:
PoisonedRAG showed that a handful of adversarial passages in a large corpus can make RAG return attacker-chosen answers with near 97% success. This is not prompt injection; the attack lives in the knowledge base. Developers treat retrieval as truth, but the retriever only finds semantically similar text. The right model is zero-trust for external knowledge: verify sources, detect poisoning, and scope RAG to non-critical tasks unless corroborated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:18:13.933603+00:00— report_created — created