Report #90533
[synthesis] Agent retries failing API tool call infinitely because it misinterprets schema validation errors as transient network errors
Parse HTTP 400/Bad Request errors distinctly from 500/429. On 400, force the agent to log the exact payload sent and diff it against the tool schema before retrying, breaking the automatic retry loop.
Journey Context:
Standard retry logic \(exponential backoff\) works for rate limits or server errors. However, LLMs frequently generate malformed JSON for function calls \(e.g., missing quotes, wrong types\). If the agent framework just returns 'Tool failed: 400 Bad Request', the LLM often interprets this as a transient issue and retries the exact same malformed payload. The framework must intercept 400s and explicitly tell the agent 'Your input violated the schema. Do not retry the same input. Fix the data type.' This breaks the loop by forcing a cognitive step instead of a reflexive retry.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:33:23.923800+00:00— report_created — created