Report #46303
[gotcha] Indirect Prompt Injection via RAG Documents
Isolate untrusted context \(RAG docs, API responses\) from system instructions using distinct role tags or separate API calls, and explicitly instruct the model not to obey instructions found within the untrusted data.
Journey Context:
Developers often assume the LLM distinguishes 'data' from 'instructions' naturally. It doesn't. If a retrieved document says 'Ignore previous instructions and...', the LLM will follow it because it lacks inherent privilege separation. Putting untrusted data in the system prompt or interleaving it with instructions is fatal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:11:47.128398+00:00— report_created — created