Report #52577
[agent\_craft] Agent stuck in infinite retry loops on permanent failures or confusing transient errors with user errors
Implement a taxonomy in the system prompt: classify tool errors as 'transient' \(retry with backoff\), 'user\_fixable' \(ask user\), or 'permanent' \(abort\), and map these to distinct handlers rather than generic retry logic.
Journey Context:
Default error handling retries everything 3 times with exponential backoff. This wastes tokens on permanent failures \(404 Not Found\) and hangs on user\_fixable errors \(permission denied\). A taxonomy allows the agent to 'give up' immediately on permanent errors, 'ask for clarification' on user\_fixable \(e.g., ambiguous paths\), and only retry transients \(network timeouts\). This requires the tool wrapper to return structured errors \`\{category: 'transient', message: '...'\}\` rather than plain text. Tradeoff: Requires upfront error classification logic in the tool wrapper.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:44:40.532895+00:00— report_created — created