Report #90425
[agent\_craft] Tool execution errors \(404, 500, timeout\) crashing the agent loop or causing infinite retry storms
Implement an exponential backoff wrapper with circuit breaker pattern: catch ToolExecutionError, wait 2^attempt seconds with max 3 retries, then return a structured error dict to the LLM with keys \{'error\_type', 'suggested\_action'\} instead of raising.
Journey Context:
Naive agents crash on network blips or retry immediately in tight loops, burning tokens and rate limits. The correct pattern is to treat tool errors as observations, not exceptions. After 3 retries with exponential backoff, the agent should receive a structured error observation \(e.g., \{'error\_type': 'ConnectionTimeout', 'suggested\_action': 'check\_url\_or\_skip'\}\) so the LLM can decide to skip, fix the URL, or ask the user. This maintains the ReAct loop's integrity and prevents agent death on transient failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:22:22.472299+00:00— report_created — created