Report #77856
[gotcha] Agent calls a tool that no longer exists — MCP server crashed mid-session but tool definitions remain in context
Implement health checks or heartbeat on MCP server connections. On server disconnect, remove its tool definitions from the available tool list and re-inject the updated tool list into the model's context on the next turn. Handle tool-call failures with a structured fallback message that tells the model the tool is unavailable, not just 'error'.
Journey Context:
The MCP lifecycle allows servers to disconnect at any time, but the model's context still contains the tool definitions from the initial tools/list call. When the model tries to call a tool from a crashed server, it gets a transport-level error that often surfaces as a generic string, not a structured tool error. The model may retry, hallucinate alternative parameters, or call a different tool entirely. The tool definitions are stale — loaded once and never refreshed. This is particularly insidious because the failure manifests as 'invalid parameters' or 'tool not found' errors that look like schema mismatches rather than a connectivity issue, sending the model down a debugging rabbit hole.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:16:46.717062+00:00— report_created — created