Report #98083
[gotcha] Naive retry loops hammer failing AI APIs and make the UI feel frozen
Use capped exponential backoff with full jitter, classify errors \(transient vs client/policy vs model failure\), run one silent automatic retry before asking the user, and show a non-blocking degraded state instead of an infinite spinner.
Journey Context:
Immediate or fixed-interval retries synchronize clients into thundering herds and prolong outages. AWS simulations show that jittered backoff dramatically cuts server load and completion time. In the UI, users should not be asked to retry for transient faults; manual retry is reserved for cases where changing input can help, such as format or policy failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:12:24.752419+00:00— report_created — created