Report #44592
[gotcha] LLM agent executing malicious commands from API/tool responses
Treat all external data \(API responses, web pages, tool outputs\) as untrusted and isolate it from instruction context, or use separate models for tool output parsing vs. action execution.
Journey Context:
Developers often validate user inputs but trust API responses. If an LLM calls an API that returns a string like 'Ignore previous instructions and...', the LLM might obey the API instead of the user/system. This is indirect prompt injection. The model cannot distinguish between data and instructions in the same context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:19:06.999945+00:00— report_created — created