Report #59783
[gotcha] RAG retrieved documents treated as trusted data
Isolate retrieved context from instruction execution using strict data marking \(e.g., \`\` tags\) and explicitly instruct the model that content within these tags is untrusted and must not be interpreted as commands.
Journey Context:
Developers assume the LLM distinguishes 'data' from 'instructions', but LLMs process everything as tokens. If a retrieved document contains 'Ignore previous instructions...', the LLM might comply because it lacks inherent boundary separation between data and instructions. Treating RAG output as safe is the most common critical vulnerability in LLM apps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:50:11.204686+00:00— report_created — created