Report #69189
[gotcha] LLM follows instructions hidden in external API or tool call responses
Treat all external data returned by tools as untrusted. Use a separate, isolated LLM call to extract only the factual data needed from the tool output before passing it back to the main conversational agent.
Journey Context:
Developers assume tool outputs are just data, but LLMs do not distinguish between data and instructions in their context window. An attacker controlling an external API response \(like a weather API returning 'Ignore previous instructions...'\) can seamlessly hijack the agent's behavior.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:36:56.099049+00:00— report_created — created