Report #68086
[gotcha] LLM agents execute malicious commands returned by benign-looking API calls or search tools
Treat all data returned from external tools/APIs as untrusted. Use a separate LLM instance to process tool outputs before passing them back to the orchestrator, or strip instruction-like patterns.
Journey Context:
Developers trust that if they call their own API, the result is safe. But if the API queries an external source \(or a compromised DB\), the text returned can contain 'Ignore previous instructions and...'. The orchestrator LLM has no inherent concept of 'data vs. instructions' from tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:46:01.037958+00:00— report_created — created