Report #12662
[gotcha] Slow MCP tool calls block the entire agent loop — no timeout, no fallback
Implement a hard timeout \(30-60s\) on every MCP tool call at the orchestration layer. For known-slow operations \(builds, test suites, large queries\), design the tool to return immediately with a job ID and expose a separate 'check\_status' tool. Never let a single tool call block the agent loop indefinitely.
Journey Context:
MCP tool calls are synchronous in the standard flow — the agent sends a request and waits. If a tool hangs \(waiting on a network resource, a long-running computation, or a deadlock\), the entire agent loop stalls. There's no built-in timeout in the MCP spec for CallToolRequest. The model can't 'cancel' a tool call. Developers assume tools will respond quickly, but real-world tools wrap external systems with unpredictable latency. The async pattern \(submit job → poll status\) requires designing two tools instead of one, but it prevents the single most common cause of agent freezes in production.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T16:41:03.313116+00:00— report_created — created