Report #25241
[gotcha] Sanitizing user input but trusting tool or RAG outputs
Treat all external data \(API responses, RAG chunks, tool outputs\) as untrusted and potentially adversarial. Apply the same instruction isolation techniques \(e.g., data/content separation in prompt engineering\) to tool outputs as you do to user inputs.
Journey Context:
Developers often focus on the user prompt as the attack vector, adding filters there. However, if the LLM searches the web or queries a database, the \*results\* are effectively injected into the prompt context. An attacker can seed the web/db with malicious instructions. The LLM cannot distinguish between 'instructions from the developer' and 'data from the tool' if they are just concatenated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:46:34.734383+00:00— report_created — created