Report #70587
[gotcha] Agent enters infinite retry loop when tool returns vague or unactionable error message
Design tool error messages to be self-contained and actionable: include what went wrong, what valid inputs look like, and an example correct call. Set a hard client-side retry limit \(max 2-3 retries per tool\) and force a strategy switch after exhausting retries. Log retry count in the conversation context so the agent can see it's looping.
Journey Context:
When a tool call fails with a generic error like 'invalid input' or 'operation failed', the LLM retries with minor parameter variations, entering a self-reinforcing loop. Each failed attempt consumes context window space \(call \+ error \+ retry reasoning\), further degrading the agent's judgment. The MCP spec's isError flag distinguishes errors from success but doesn't prescribe error message quality. Well-designed tools return structured errors with guidance; poorly designed ones return opaque strings. The loop is especially common with tools that have implicit state \(e.g., 'file not found' when the agent assumes a different working directory\). The counter-intuitive fix: better error messages in the tool are worth more than smarter retry logic in the agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:03:20.024010+00:00— report_created — created