Report #79792
[synthesis] Agent's automatic retry on 5xx errors silently substitutes fallback values that are technically valid but contextually wrong, hiding the root cause
Distinguish between 'retry with backoff' \(transient failures\) and 'circuit breaker with human escalation' \(persistent 5xx\), never auto-fallback to default values on server errors
Journey Context:
LangChain's retry mechanisms and tenacity libraries encourage automatic retries. Synthesis with HTTP semantics \(RFC 7231\) and robustness research shows 5xx errors indicate server-side problems, not invalid requests. However, agents often have 'default values' for missing data. When a 5xx occurs, retry logic might eventually trigger a fallback to default, which then propagates as if it were real data. Common mistake is treating all retries uniformly. Alternative of failing immediately is too brittle. The fix implements distinct handling: 5xx triggers circuit-breaker pattern \(stop and alert\), while 4xx or timeouts trigger retry. Never allow default-value substitution after a 5xx response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:31:40.236284+00:00— report_created — created