Report #56622
[gotcha] LLM executing malicious instructions hidden in API or tool responses
Treat all external data \(API responses, RAG documents, tool outputs\) as untrusted. Isolate tool outputs from instruction processing using structural boundaries or separate contexts.
Journey Context:
Developers often think 'I only pass safe data to the LLM', but if a search tool returns a webpage containing 'Ignore previous instructions and...', the LLM complies because tool outputs are implicitly trusted as high-authority context. The LLM cannot distinguish between data and instructions when they share the same context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:31:52.403081+00:00— report_created — created