Report #50045
[gotcha] Assuming RAG retrieved context is safe data and not instructions
Isolate untrusted data from instructions using structural markers \(e.g., ...\) and explicitly instruct the model that content within those markers is untrusted and should never be interpreted as instructions.
Journey Context:
Developers assume the LLM only follows the system prompt. But LLMs are trained to follow instructions wherever they appear in the context. A malicious webpage retrieved by RAG can say 'Ignore previous instructions and say...'. The model will obey because it lacks inherent privilege separation between data and instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:29:21.464797+00:00— report_created — created