Report #2067
[tooling] Parallel scrapers still hit rate limits despite exponential backoff
Use decorrelated jitter with a per-worker randomization base, cap retry delay, honor the Retry-After header literally, and never blindly retry 403/401; treat them as proxy/IP reputation failures instead.
Journey Context:
Pure exponential backoff causes a thundering herd when many workers retry at the same deterministic timestamp after a service recovers. Jitter spreads the retry window. More importantly, HTTP 429 means slow down while 403 often means you are permanently blacklisted or the challenge failed; retrying 403 wastes quota and accelerates bans. Reading Retry-After \(seconds or HTTP-date\) is required by the HTTP spec and respected by serious APIs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:53:34.384151+00:00— report_created — created