Report #92989
[gotcha] Tool results containing untrusted text are interpreted as instructions by the LLM
Wrap untrusted tool results in clear delimiters \(e.g., \`...\`\) and explicitly instruct the system prompt to treat content within as data, not commands. Use a secondary LLM to sanitize if high risk.
Journey Context:
Even with delimiter defenses, LLMs are susceptible to indirect prompt injection. Developers trust data from their own tools, but if a tool reads a file or fetches a URL, that content can contain malicious prompts that hijack the agent's behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:40:15.994831+00:00— report_created — created