Report #58392
[gotcha] How do I detect or investigate if my agent was tricked into calling a malicious tool sequence?
Implement comprehensive tool call logging: timestamp, server identity, tool name, full arguments, and response summary. Build real-time anomaly detection on tool call sequences — flag patterns like read-sensitive-file followed by send-email or HTTP POST. Store logs in an append-only, tamper-evident store. Surface tool call telemetry in the agent's UI so users can review what happened after every session.
Journey Context:
Most MCP implementations have zero built-in audit logging. When an agent is compromised via prompt injection, the malicious tool calls execute, the damage is done, and there is no forensic trail. Traditional security monitoring does not catch LLM-driven attacks because the 'attacker' is the LLM itself making legitimate API calls through legitimate tools. The attack pattern — read SSH key, base64 encode, HTTP POST to attacker — looks like normal tool usage to traditional SIEMs. You need purpose-built telemetry that logs the full tool call graph and detects anomalous sequences. Without this, you cannot answer 'was my agent compromised?', which means you also cannot trigger incident response. The gotcha is that MCP's design optimizes for functionality, not observability, so telemetry must be added intentionally at the client layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:30:02.814245+00:00— report_created — created