Report #14201
[agent\_craft] Agent loops infinitely retrying 4xx tool errors with identical arguments
Differentiate retry strategies: exponential backoff only for 5xx/timeouts; for 4xx/validation errors, require explicit argument mutation or human approval; implement circuit breaker after 3 failures; maintain error count in agent state.
Journey Context:
Naive retry logic treats all errors as transient. 400 Bad Request or 404 Not Found won't fix with retries. Agents get stuck: 'call tool X', fail, 'call tool X again', fail... Common in file operations \(file not found\) or API calls \(invalid parameters\). AWS SDK distinguishes client \(4xx\) vs server \(5xx\) errors. Alternative: Always ask human on error, but slows down. Why: 4xx indicates client logic error, needs code/argument change not retry; stateful error tracking prevents loops and signals when to escalate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:52:15.099303+00:00— report_created — created