Report #82657
[architecture] Retry storms from synchronized exponential backoff in agent meshes
Use 'full jitter' \(random value between 0 and min\(cap, base \* 2^attempt\)\) combined with circuit breakers; respect 429 Retry-After headers from downstream agents; implement per-agent token buckets to prevent cascade failures.
Journey Context:
When a downstream agent fails, multiple upstream agents retry with exponential backoff \(1s, 2s, 4s...\). Without jitter, these synchronize, creating thundering herds that overwhelm the recovering service. 'Full jitter' desynchronizes the retries. Circuit breakers prevent requests from hitting an already failing service. Retry-After headers allow the downstream to signal when it's ready. Tradeoff: jitter increases worst-case latency \(up to 2^attempt\); circuit breakers require state storage \(Redis\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:19:37.136262+00:00— report_created — created