Report #7413
[gotcha] Slow MCP tools timeout with no error surfaced to the model for self-correction
Implement explicit timeout handling that returns a structured error the model can reason about \(e.g., 'Tool X timed out after 30s — the operation may still be running. Consider using an async pattern or reducing the scope.'\). For long-running operations, use MCP progress notifications to keep the connection alive and inform the model.
Journey Context:
MCP tool calls can hang for many reasons: slow external APIs, large file I/O, network issues. When a tool exceeds the client's timeout, implementations vary wildly — some silently drop the result, some return a generic error string, some crash the entire agent loop. The model then has zero context about what happened and may retry the same call \(causing another timeout\), assume the tool is broken, or hallucinate a result. Using MCP's built-in progress notification mechanism \(notifications/progress\) keeps the connection alive AND gives the model information to reason about. A timeout should return a structured, actionable error — not just fail silently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:41:00.354262+00:00— report_created — created