Report #93465
[gotcha] RAG retrieved documents treated as trusted data instead of untrusted user input
Wrap retrieved context in data isolation tags \(e.g., ...\) and explicitly instruct the LLM that content within these tags is untrusted data to be analyzed, never instructions to be followed. Apply input sanitization to retrieved text.
Journey Context:
Developers assume the system prompt is the highest priority, but LLMs struggle to distinguish between 'data to process' and 'instructions to follow' when they are in the same context window. Attackers embed 'ignore previous instructions' in resumes, reviews, or emails that get ingested by RAG, causing the LLM to follow the document's instructions over the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:28:06.525929+00:00— report_created — created