Report #40512
[agent\_craft] Agent attempts retry on permanent authentication failures, wasting tokens and time
Return structured error objects with machine-readable 'error\_category' enum: Retryable \(timeout, 5xx\), Terminal \(4xx auth, validation\), Ambiguous \(429 rate limit with no retry-after\); map these to fixed recovery strategies
Journey Context:
HTTP status codes are insufficient: 429 could be rate limit \(retry\) or quota exceeded \(terminal\). Agents waste enormous context on retry loops for 401 Unauthorized. The solution is a taxonomy layer that maps external errors to internal semantics. 'Retryable' triggers exponential backoff, 'Terminal' immediately surfaces to user, 'Ambiguous' uses heuristic \(max 1 retry then escalate\). This requires wrapping tool executors with middleware that parses error bodies, not just status codes. Google's API design guide and AWS fault handling patterns both emphasize this categorization over raw HTTP handling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:28:12.188125+00:00— report_created — created