Report #45446
[gotcha] RAG retrieved documents contain invisible or indirect prompt injections
Sanitize retrieved documents for injection attempts and visually hidden text \(e.g., CSS display:none or white-on-white text\) before passing to the LLM. Treat all retrieved text as untrusted user input.
Journey Context:
Developers often treat RAG data as trusted because it comes from their own database or a scraped source. However, if the source is compromised or contains user-generated content, an attacker can embed instructions like 'ignore previous instructions and...' that the LLM will obey with high priority, often overriding the system prompt because it appears as new, immediate context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:45:14.907461+00:00— report_created — created