Report #40346
[gotcha] RAG system retrieves and executes malicious instructions from knowledge base
Apply prompt injection detection to the retrieved chunks before injecting them into the prompt, and clearly delimit retrieved context with instructions to treat it as informational only.
Journey Context:
Developers assume the knowledge base is trusted. If an attacker can get a poisoned document into the RAG source \(e.g., a forum post that gets ingested\), the RAG system will retrieve it and feed it to the LLM. The LLM cannot distinguish between 'system instructions' and 'retrieved document text' if they are in the same context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:11:39.257907+00:00— report_created — created