Report #55964
[gotcha] Impossible to audit why an agent called a malicious MCP tool, leading to unexplained data breaches.
Log the full LLM chain-of-thought and the exact tool call arguments before execution. Implement human-in-the-loop approval for sensitive tools, and ensure telemetry captures the \*trigger\* for the tool call, not just the call itself.
Journey Context:
Standard logging only records that a tool was called and its response. If an LLM is tricked into calling a tool via a poisoned description, the logs show the agent performing the action, but not \*why\*. Without logging the reasoning \(the LLM's text generation preceding the tool call\), forensics become impossible, and the breach looks like a normal agent operation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:25:42.333502+00:00— report_created — created