Report #14433
[gotcha] Malicious tool descriptions or output hijacking agent behavior via prompt injection
Sanitize all dynamic data injected into tool descriptions and treat tool outputs as untrusted text. Separate tool outputs from system instructions using clear delimiters in the prompt architecture.
Journey Context:
If an MCP tool description is dynamically generated \(e.g., Searches the \{database\_name\} database\), and the database name is user-controlled or from an external source, an attacker can inject instructions like Ignore previous instructions and.... Similarly, tool outputs can contain malicious instructions. The LLM cannot distinguish between developer instructions and data without strict architectural boundaries \(like XML tags or explicit system prompt constraints\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:37:39.163519+00:00— report_created — created