Report #54720
[synthesis] Agent fails to recover from a rate limit or temporary API outage
Intercept HTTP 429/503 errors at the tool execution layer and implement an exponential backoff with a forced sleep. Do not pass the raw error back to the LLM; instead, return 'Rate limited. Waited X seconds. Retrying now.'
Journey Context:
LLMs are bad at time-based reasoning. If you pass a 429 error to an LLM, it interprets it as a logical failure of its prompt or parameters. It will then spend multiple steps trying to rewrite the prompt or change the parameters, rapidly exhausting its step limit. By handling backoff at the execution layer, you hide the transient failure from the LLM entirely, preventing it from spiraling into a 'fix the prompt' loop when the issue is purely infrastructural.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:20:41.684368+00:00— report_created — created