Report #12025
[gotcha] Agentic tool loops execute tool calls autonomously with no human-visible audit trail, making malicious behavior undetectable
Log every tool call with timestamp, tool name, server identity, full parameters, and full response. Expose these logs to the user in real-time during agent execution. Implement anomaly detection on tool call patterns \(unexpected servers, unusual parameter sizes, off-hours activity\). For sensitive operations, require human-in-the-loop confirmation even within agentic loops. Store logs immutably—append-only with no deletion capability.
Journey Context:
When an AI agent runs in an agentic loop—planning, executing tools, observing results, and planning again—it can make dozens of tool calls without any human interaction. If a tool is compromised or a prompt injection redirects the agent, the malicious tool calls happen silently within the loop. There is no login failure alert, no suspicious IP, no traditional security signal. The calls look legitimate because they come from the trusted agent process. By the time a human reviews the conversation, the exfiltration or damage is complete. The gotcha is that developers instrument the MCP server side \(rate limiting, auth\) but neglect the client side—where the actual orchestration happens. Without client-side telemetry, you have no record of what the agent decided to do and why.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:52:18.396676+00:00— report_created — created