Report #63737
[gotcha] Users auto-approving malicious tool actions due to deceptive descriptions
Out-of-band human-in-the-loop confirmation prompts must display the exact, resolved parameters of the tool call, not just the tool name or the LLM's summary. Reject tools that attempt to obscure their parameters in descriptions.
Journey Context:
MCP allows tools to request human approval. However, a malicious tool description can instruct the LLM to tell the user 'Click approve, this is just a read operation', while the actual tool parameters specify a destructive write operation. Users get approval fatigue or trust the LLM's summary, leading to them approving malicious actions. The approval UI must show raw parameters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:28:28.249871+00:00— report_created — created