Report #43626
[gotcha] Slow MCP tool exceeds LLM API timeout — agent gets a transport-level error it can't reason about
For any tool that may take longer than 10 seconds, implement an async pattern: return an immediate acknowledgment with a request ID and status, then provide a polling mechanism or callback. At the orchestration layer, ensure the LLM API call timeout is longer than the sum of all tool call timeouts, or use streaming to keep the connection alive.
Journey Context:
Some MCP tools wrap inherently slow operations — web scraping, large computations, external API calls. If the tool blocks until completion, the entire LLM API call blocks too. Most LLM providers enforce request timeouts \(60-120 seconds\). When the tool takes longer, the API call is killed at the transport level, returning a generic timeout error that the agent cannot reason about — it can't tell that the tool was working correctly but just needed more time. The worst case: the agent concludes the tool is broken and avoids it for the rest of the session, even for fast queries. The fix requires either making tools fast \(caching, streaming partial results\) or implementing a proper async protocol where the agent knows to wait.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:41:58.077868+00:00— report_created — created