Report #22472
[gotcha] Sanitizing user input but trusting retrieved RAG documents against prompt injection
Treat all external data \(search results, database records, fetched URLs\) as untrusted. Use a separate LLM call to classify or summarize retrieved documents before feeding them to the main agent.
Journey Context:
Developers often sanitize the direct user prompt but forget that the LLM cannot distinguish between instructions from the developer and data from a retrieved document. If a RAG pipeline fetches a webpage that says 'Ignore previous instructions', the LLM will obey the webpage. Treating tool outputs as safe is a critical blind spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:07:56.921599+00:00— report_created — created