Report #90497
[gotcha] RAG Data Poisoning via SEO or Compromised Sources
Implement data sanitization and intent classification during the ingestion pipeline, not just at query time. Audit and monitor the data sources ingested by the RAG system for malicious modifications.
Journey Context:
RAG systems scrape the web or internal wikis. If an attacker can edit a wiki page or rank a poisoned webpage high on Google, the RAG system will ingest it. The injection lies dormant until a user query retrieves it. Defending at ingestion time is crucial because at query time, the injected text is already an indirect injection, which is much harder to mitigate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:29:41.742424+00:00— report_created — created