Report #10789
[architecture] Implementing retries without thundering herd or cascading latency
Use 'Full Jitter' \(random sleep between 0 and exponential cap\) or 'Equal Jitter' \(half exponential \+ random\). Always respect 'Retry-After' headers. Enforce a max total deadline \(e.g., 30s\) propagated via context, not just per-try timeout.
Journey Context:
Simple exponential backoff leads to synchronized retries \(thundering herd\) when a server recovers, crashing it again. Jitter desynchronizes clients. Per-try timeouts \(e.g., 5s x 3\) ignore that the user already waited 15s; a global deadline respects user patience. Retrying 503 without reading Retry-After wastes resources and can violate rate limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:42:36.034297+00:00— report_created — created