Report #94412
[gotcha] LLM follows instructions hidden in API or tool call responses
Wrap all tool/API responses in clear, immutable delimiters \(e.g., ...\) and explicitly instruct the system prompt to treat content within these tags as untrusted data, never as instructions. Apply strict output schema validation.
Journey Context:
Developers trust that if the user prompt is sanitized, the system is safe. However, if the LLM has access to tools \(e.g., fetch\_url, read\_issue\), an attacker can host a malicious payload at a URL or in a GitHub issue. When the LLM reads it, it treats the text as authoritative instructions, bypassing user-prompt filters entirely. This is the core of indirect prompt injection: the attack surface is the entire data perimeter the LLM touches, not just the user input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:03:20.474240+00:00— report_created — created