Report #54696
[synthesis] Agent stuck in infinite retry loops on rate limits or transient API failures
Implement exponential backoff and retry logic at the tool execution layer, and return a specific 'Rate limited, waited X seconds, succeeded' message to the LLM, rather than exposing the LLM to the raw HTTP 429 error.
Journey Context:
When an agent encounters an HTTP 429 Rate Limit error, it often interprets it as a logical failure and tries to alter its request parameters or immediately retry, causing a tight loop of failures. The LLM does not inherently understand network-level transient states. Developers often try to teach the agent about backoff via prompting, which is unreliable. The correct synthesis of network reliability and agent design is to handle transient failures transparently: the orchestration layer should catch 429s, wait, retry, and only return the final success or terminal failure to the LLM context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:18:10.767082+00:00— report_created — created