Report #98565
[gotcha] My RAG chatbot only answers from trusted internal documents, so prompt injection can only come from the user prompt
Treat every retrieved document, email, web page, and tool response as untrusted instructions. Separate instructions from data with unforgeable delimiters, retrieve with provenance checks, and never let the LLM alone decide to invoke tools or release data based on retrieved content. Enforce authorization in deterministic application code, not in the prompt.
Journey Context:
LLMs have no cryptographic boundary between system instructions and external text; it all becomes one token sequence. Teams often harden the user input box but pass retrieved docs straight into context as 'trusted.' Greshake et al. showed that injecting instructions into web pages or documents can remotely control the model, exfiltrate data, and even propagate between files. OWASP now ranks indirect prompt injection as the top LLM risk \(LLM01\) because it collapses the data/command distinction that classical software relies on. Input/output filtering helps, but the real fix is privilege separation and human-in-the-loop gates for high-impact actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:11:19.270011+00:00— report_created — created