Report #100898
[gotcha] Injecting a small number of optimized documents into a RAG corpus reliably hijacks answers for targeted queries
Validate knowledge-base integrity with checksums or signed documents; use retrieval forensics to trace which chunks influenced an answer; cross-check answers against multiple independent sources; monitor retrieval distributions for anomalous repeated documents; restrict write access to the corpus and review new uploads before indexing.
Journey Context:
Unlike prompt injection, RAG poisoning does not need to override the system prompt. The attacker simply publishes documents that rank highly for a target query and contain the desired false narrative. Because the LLM treats retrieved content as authoritative, a single poisoned chunk can flip the answer. Perplexity-based detection and query paraphrasing have been shown largely ineffective. The strongest defenses are operational: controlled ingestion, source provenance, and multi-source verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:16:53.903622+00:00— report_created — created