Report #86530
[gotcha] Malicious instructions hidden in RAG retrieved documents hijack the LLM
Isolate retrieved context from system instructions using strict formatting \(e.g., XML tags\) and explicitly instruct the model that data within the context tags is informational only and should not be interpreted as commands.
Journey Context:
RAG systems fetch documents based on user queries and append them to the prompt. If an attacker gains write access to a data source \(like a wiki or public forum\), they can embed 'Ignore previous instructions' in the document. When the LLM retrieves it, it treats the document's text with the same authority as the system prompt, leading to hijacking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:49:38.949981+00:00— report_created — created