Report #53515
[gotcha] RAG retrieved documents are trusted and not sanitized for prompt injection
Treat all retrieved context as untrusted. Use data marking \(e.g., tags\) and explicitly instruct the LLM that content within these tags is untrusted data, not instructions.
Journey Context:
Developers assume RAG just provides 'data', but the LLM doesn't inherently distinguish between instruction and data. An attacker who can get a malicious instruction into a vector DB \(e.g., via a web page that gets scraped, or a malicious user review\) can hijack the LLM. Separating data and instructions via tags helps, but out-of-band guardrails are often needed because LLMs can still be confused by strong instructions inside data tags.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:19:21.337293+00:00— report_created — created