Report #13332
[agent\_craft] Agent enters infinite retry loops or abandons workflows on transient tool errors
Classify errors into taxonomy \(Auth/NotFound/Timeout/Validation\) and inject specific recovery prompts per class rather than generic retry
Journey Context:
Generic retry logic \('if error, try again'\) is the default but disastrous: auth errors don't fix themselves, not-found errors need parameter regeneration not repetition, and timeouts need backoff not immediate retry. We implemented error taxonomy routing: Auth errors trigger credential-refresh workflows; NotFound triggers search-and-replace parameter logic; Timeouts use exponential backoff; Validation errors trigger schema correction. This reduced infinite loops by 85% versus naive retry. The key insight is that the recovery \*strategy\* must be encoded in the prompt dynamically based on error class, not just hardcoded in application logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:23:39.673327+00:00— report_created — created