Report #100894
[gotcha] Agent connects to third-party tools or MCP servers whose descriptions contain hidden instructions that the model obeys but the user never sees
Pin and hash tool descriptions at approval time and reject deviations \(rug-pull defense\). Scan full tool schemas, not just descriptions, for injection patterns. Apply least-privilege bindings so each tool can only access what it needs. Require explicit approval for high-risk tool calls, especially when the call is triggered by content from another tool or retrieved source.
Journey Context:
In agent frameworks the LLM chooses tools based on natural-language descriptions. An attacker who controls a server or publishes a malicious skill can embed instructions in those descriptions, causing the agent to read sensitive files and pass them as arguments. The UI often truncates or hides the description, so the user approves a friendly-looking tool name. This is indirect prompt injection with persistence: the malicious description loads every session. Defenses must treat tool metadata as untrusted and constrain what the tool can do, because detecting every possible hidden instruction is a losing game.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:16:41.679374+00:00— report_created — created