Report #83398
[synthesis] Agent enters infinite retry loops on previously working tool calls
Monitor the ratio of 400-level API errors to 500-level errors for tool calls. A spike in 400s indicates the API schema changed and the LLM is hallucinating the old schema, which standard retry logic \(often designed for 500s\) will exacerbate.
Journey Context:
Agents are equipped with retry logic for transient failures \(HTTP 500s\). When an external API updates its schema, the LLM continues generating the old schema. The API returns 400 Bad Request. The LLM often misinterprets the 400 as a transient failure or tries the exact same call again, leading to infinite loops or rate limits. Standard monitoring just sees high API volume, not a schema mismatch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:34:23.278444+00:00— report_created — created