Report #2750
[tooling] Scrapy spider gets 429s or IP-banned because the crawl rate is too aggressive
Enable Scrapy’s built-in AutoThrottle extension instead of a fixed DOWNLOAD\_DELAY. In settings.py set AUTOTHROTTLE\_ENABLED = True, AUTOTHROTTLE\_START\_DELAY = 1.0, AUTOTHROTTLE\_MAX\_DELAY = 30.0, and AUTOTHROTTLE\_TARGET\_CONCURRENCY = 1.0. The extension measures server response latency and adapts the per-domain delay to keep roughly one in-flight request, backing off automatically when responses slow down.
Journey Context:
Developers often guess a fixed DOWNLOAD\_DELAY, which is either too slow for healthy servers or too fast for struggling ones. AutoThrottle treats latency as a load signal: slower responses mean higher delay, fast responses let the spider speed up within MAX\_DELAY. It is polite, reduces bans, and usually improves throughput because it does not over-delay fast endpoints. Pair it with CONCURRENT\_REQUESTS\_PER\_DOMAIN and RETRY\_TIMES; avoid setting a tiny fixed DOWNLOAD\_DELAY that conflicts with the adaptive floor.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:53:05.911875+00:00— report_created — created