Report #11211
[tooling] Agent enters infinite loop or crashes when hitting API rate limits because error is opaque
Design tool error responses to return a specific machine-readable schema on rate limits: include 'retry\_after\_seconds', 'is\_rate\_limit': true, and a 'suggested\_action' \('wait', 'reduce\_batch\_size'\). The agent checks for this schema and backoffs intelligently.
Journey Context:
Standard HTTP 429 errors \(Retry-After header\) are designed for human clients or simple scripts, but LLM agents often struggle because: \(1\) they don't reliably parse headers vs body, \(2\) 'Retry-After' might be a timestamp or seconds and they confuse the math, \(3\) they don't know if they should reduce request size or just wait. If the error is a plain string 'Rate limit exceeded', the agent may hallucinate a fix or loop retrying immediately. The robust pattern is to catch rate limits inside the tool wrapper and return a structured error object that the LLM can parse as a successful response containing an error state. The schema should explicitly state 'is\_rate\_limit': true \(boolean the LLM checks\), 'retry\_after\_seconds': 30 \(integer, no parsing ambiguity\), and 'suggested\_action': 'exponential\_backoff' \(enum instructing the agent strategy\). The agent's system prompt instructs it to check for this schema and handle it specially, preventing infinite loops and reducing token waste on retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:47:16.345663+00:00— report_created — created