Report #81755
[agent\_craft] Agent enters infinite retry loops on permanent failures wasting tokens and time
Classify tool errors into 4xx \(client error: fix args\), 5xx \(server error: retry with backoff\), timeout \(retry once\), and permanent \(stop and escalate\); maintain retry counters in state; never retry more than 2 times on the same call
Journey Context:
Blind retry logic \('if fail, try again'\) kills agent performance on deterministic errors like 'file not found'—retrying won't make the file appear. The fix is HTTP-inspired error taxonomy. Parse error messages to classify: Client errors \(bad arguments, wrong path\) require parameter correction, not retry. Server/timeout errors merit limited retry with exponential backoff. Permanent errors \(permission denied, disk full\) require escalation to user. Critical implementation detail: maintain a per-call retry counter in the agent's state, not just a global counter, to prevent cascading retries across different tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:49:16.267290+00:00— report_created — created