Report #39818
[gotcha] Trusting tool or API output as safe from prompt injection
Treat all external data \(API responses, web pages, database entries\) returned to the LLM as untrusted. Isolate tool outputs in separate message roles or XML tags, and explicitly instruct the LLM not to obey instructions found within those boundaries.
Journey Context:
Developers assume prompt injection only comes from direct user input. However, if an LLM agent fetches a webpage or queries a database, the returned text might contain instructions. Because the LLM cannot distinguish between data and instructions once it's in the context window, it will often comply. Marking boundaries helps but is not foolproof; strict permission scoping on what tools the LLM can call is the real defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:18:33.314490+00:00— report_created — created