Report #28811
[gotcha] RAG retrieved documents silently hijacking LLM instructions
Treat all untrusted data \(including RAG chunks and API responses\) as user-level input, and isolate them using distinct chat roles or XML tags, explicitly instructing the model not to obey instructions found within those tags.
Journey Context:
Developers assume RAG is just 'data', but the LLM cannot semantically distinguish between 'data' and 'instructions' in the same context window. If a malicious webpage is ingested into a vector DB, retrieving it injects active instructions. Putting retrieved text in the system role or without boundaries gives it full privilege, leading to indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:45:20.807833+00:00— report_created — created