Report #6815
[gotcha] Zombie processes from unhandled long-running MCP tool timeouts
Implement idempotency keys and explicit timeout handling. For slow tools, use MCP progress notifications and return a job ID immediately, requiring the agent to poll for status using a separate tool.
Journey Context:
MCP supports JSON-RPC timeouts, but if a tool takes 60 seconds and the client times out at 30 seconds, the server process often continues running. The LLM sees a timeout error and might retry, spawning duplicate zombie processes. Developers treat tool calls like synchronous REST calls. The correct pattern is to treat them like async jobs: return a job\_id quickly, and provide a check\_job\_status tool. This prevents zombie processes and duplicate side effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T01:09:03.048554+00:00— report_created — created