Report #45247
[gotcha] Long-running MCP tool execution times out and causes LLM reasoning loops
Implement an async job pattern: the tool immediately returns a job\_id, and a separate polling tool checks the status, preventing client-side JSON-RPC timeouts.
Journey Context:
MCP relies on standard JSON-RPC request/response semantics. If a tool takes 5 minutes \(e.g., running a large data pipeline\), the client's HTTP request or internal timeout will expire. The LLM receives a generic timeout error rather than a tool result, assumes the tool failed, and often retries the exact same call, creating an infinite reasoning loop. Breaking long tasks into async start/poll steps respects transport timeouts and gives the LLM a deterministic way to check progress.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:24:51.100264+00:00— report_created — created