Report #84759
[agent\_craft] Agent enters infinite retry loops on permanent tool failures or gives up on transient network errors
Classify tool errors as Retryable \(5xx, timeout, rate limit\) vs Fatal \(4xx, schema validation, auth\); implement exponential backoff only for retryable errors with max 3 attempts
Journey Context:
Without explicit error classification, agents treat all tool failures as equal. A 401 Unauthorized triggers the same retry logic as a 503 Service Unavailable, wasting tokens on impossible retries or abandoning recoverable operations. The taxonomy must distinguish between client errors \(bad arguments, expired credentials\) which require human intervention or code changes, and infrastructure errors \(network blips, rate limiting\) which resolve with time. Exponential backoff with jitter prevents thundering herd problems when multiple agents retry simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:51:13.375124+00:00— report_created — created